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Preface 



Let us assume that the mission of a preface is to install some context, some 
common ground where author and reader meet. The usual practice is to say a 
few words about the motivation of the book and to sketch how it came into be- 
ing, evoking the scientific background, the places, or the people which or who 
played a role in its history. Some of the theories, the places, or some of the 
persons named may be known to the reader, and before we know where we are, 
we have some experiences and acquaintances in common. For the audience, 
the environment of the book becomes more familiar and more encouraging 
ground for sharing knowledge. 

As with many manuscripts, this book has a long history. When I started the 
investigation which forms the original contribution to this book, my strongest 
incentive was the idea that it must be possible, by applying methods from 
cognitive psychology, to explain professional summarizing (abstracting and 
indexing) better. I was unhappy with the know-how that I was able to teach my 
students, wondering that colleagues found it appropriate. I anticipated that in- 
terdisciplinary methods could be combined in a cognitive science framework to 
create a new view of summarizing. On the horizon I saw the full-text databases 
that would ease the application of and increase the need for summarizing 
systems. 

At the time of publication, the field of (computational) summarizing is 
speeding up under newly grown user demands, against the background of full- 
text databases and computer networks. Since we are still only just beginning to 
understand what happens during summarizing, this book can only provide an 
overture to expanding our knowledge. A finale-style record of well-established 
theories would presuppose a field with long-standing merits that we do not 
have. Attentive readers will notice the overture character of this book in many 
respects. One very visible feature is the absence of any far-reaching conclu- 
sions. Those who doubt whether such an intellectual attitude is legitimate are 
reminded that staging knowledge and questions about summarizing conforms to 
a good theatrical tradition. Listen for instance to Bertold Brecht. At the end of 
his play Good woman of Sezuan the actors, in consternation about the real-world 
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view of the story which they have presented on the stage, urge the audience to 
provide a better conclusion, which must exist somewhere: 

Verehrtes Publikum, los, such dir selbst den SchluB! 

Es muB ein guter da sein, muB, muB, muB! 

For the author, this publication marks the end of a decade. After my first arti- 
cles about abstracting around 1985, I delved into a period of empirical research 
from 1989- 1993. During this time, the place of action was at two Saarbriicken 
institutes, the German Institute for Artificial Intelligence and the Information 
Science Institute lAI. From there, field research took me and the investigation 
to Los Angeles, California, and College Park, Maryland. Back to Hanover 
when the first research grant finished, the work split up in two directions. A first 
German manuscript was produced and went into a reshaping procedure 
resulting in this book. In parallel, in 1994 - 1996 the summarizing model was 
implemented as a multimedia simulation system. The SimSum system is 
included in this book. Intermediate aims on the path towards implementation 
and publication were the Dagstuhl seminar Summarizing text for intelligent 
communication in December 1993, and the special issue Summarizing text of 
Information Processing & Management in 1995. 

During the final stage at Hanover, the manuscript wandered between two 
worlds. It saw not only the obvious Mac and desk environment at home and at 
the Information and Communication Department of the University for the Ap- 
plied Sciences, but in the Mac it also shared long stays at the hospital of the 
Medical School. Slowly but tenaciously, the manuscript grew. 

Many colleagues have accompanied the research and its conversion to pub- 
lication. It would take too long and risk too many errors to list them all. So I 
beg some individuals who have made a major contribution to accept my thanks 
also on behalf of those many who remain unmentioned: 

• the six summarizing experts who with their knowledge laid the basis for 
the empirical investigation: Harold Borko, Edward Cremmins, Ingetraud 
Dahlberg, Andreas Gerards, Marliese Gunther, and Hannelore Schott 

• the SimSum project team whose work is integrated in this volume: Kai 
Haseloh, Jens Mueller, Simone Peist, Irene Santini de Sigel, Alexander 
Sigel, Elisabeth Wansorra, Jan Wheeler, and Briinja Wollny 

• Wolfgang Wahlster, who saw from the early beginnings the impact of the 
research and helped to put it on track 

• Karen Sparck Jones, who has been a reliable partner and chairperson at the 
Dagstuhl seminar and while preparing the special issue of Information 
Processing & Management 

• Hans Wossner, who has accompanied the publication project steadfastly, 
constructively, and with his particular brand of friendliness and openness 
through a number of years and situations. He has done much more than an 
editor is expected to do 
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• the Hanover hematologists Arnold Ganser and Bernd Hertenstein. They en- 
abled me to finish the manuscript, by transplanting my sister’s hemopoietic 
cells. 

I thank the German Science Foundation (DFG) for supporting the empirical re- 
search (grant En 186/1-3) and the German Federal Ministry for Education and 
Research (BMBF) for funding the implementation (grant F0916.00). 

For me, writing this preface marks the end of one adventure and the start of a 
new one. Not only is the book open-ended, there is also a change in casting 
and roles now that the audience enter the scene. The author is curious to see 
who will arrive. I hope everybody has fun with the book and the simulation, and 
particularly in the event of trouble, call, write, or send an email! 



Hanover, May 1998 



Brigitte Endres-Niggemeyer 
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1 Introduction 



Summarizing - an interdisciplinary account for everybody 

This book submits to its readers what we know about summarizing, i.e., the re- 
duction of mostly textual information to its most essential points. Summarizing 
is interesting on account of its practical use, but also as an important 
achievement of human cognition. First and foremost, summarizing appears 
here as the summarization of texts in natural language. This is due to the fact 
that most research has treated summarizing of written text. Consequently, we 
know much more about the summarization of natural language texts than, say, 
summarizing visual media. However, we summarize representations, not lan- 
guage. Nevertheless, our state of knowledge about summarizing makes us pre- 
sent summarizing as a cognitive process normally using a linguistic represen- 
tation. This book is no exception to the rule, but it makes some modest at- 
tempts to extend the view to other media than written text. 

Several scientific disciplines such as psychology, linguistics, artificial intel- 
ligence, and information science contribute to the study of summarizing. One 
focus of interest is how people go about summarizing, i.e., what intellectual 
work they do. The other very practical interest is to develop automatic proce- 
dures for summarizing, for instance, the huge amounts of information in com- 
puter networks. These two scientific tasks are not independent of each other. 
They are most at home in an interdisciplinary research field which has been 
dubbed cognitive science. It integrates psychology, linguistics, and artificial 
intelligence, together with other disciplines that contribute to the investigation 
of the human mind. The interdisciplinarity argument also serves the interests of 
teachers, students, and practitioners of summarizing. What they need is good 
know-how on methods for immediate application. 

Since summarizing is an interdisciplinary topic, its presentation must cater 
to readers with different backgrounds. Then the next question is whether the re- 
sulting explanation cannot also serve people from other contexts who are inter- 
ested in summarizing, for example, journalists, economists, or teachers. After 
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all, summarizing occurs in many professions as part of everyday tasks, so that 
people might conceivably be interested in understanding more comprehen- 
sively what they do. An interdisciplinary presentation aggregates knowledge 
from different research lines and by necessity smooths out at least some of 
their peculiarities. Summarizing is a complicated cognitive process and needs 
a high investment in presentation. So why not provide some core notions of 
cognitive science and communication to set up a minimum of common inter- 
disciplinary ground, and address this account of summarizing to everybody who 
is interested in it? Why not let mechanical engineers, teachers, economists, 
mathematicians, or biologists join the cognitive science party dealing with 
summarizing? 



A print medium and a simulation system 

The explanation of summarizing is distributed on two complementary media, a 
printed book and a simulation system on CD-ROM called SimSum (Simulation 
of Summarizing). Figure 1.1 shows how book and CD-ROM simulation are in- 
terrelated. The print presentation is more comprehensive, while the simulation 
goes in a movie-like style through real expert summarizing sequences. They 
are difficult to imagine without appropriate support. Here, a computer system is 
superior to print media. The effect is known in principle from flight simulators. 

Whereas a computer system can rearrange what it presents on a screen, the 
printed page must leave to the eye and the imagination of the beholder every- 
thing that moves, helping at best by a static presentation of stages as used in 
the illustrated broadsheet section of the chapter Professional summarizing. An 
ongoing process is much easier to follow, however, if the recipient can concen- 
trate on the process itself, without bothering with the rearrangement of data as 
the process moves on. 

A combined medium that addresses readers with varied backgrounds and in- 
terests must accommodate non-sequential and partial reading motivated by 
personal interest. An obvious non-sequential reading strategy for somebody in- 
terested in automatic summarizing might start at the (last) chapter about 
computational approaches and then expand the scope to human summarizing, 
which is explained earlier. Readers who prefer computerized media can use 
SimSum as a first access if they are familiar with current cognitive science 
concepts. The simulation includes an explanatory hypertext of its own, such 
that readers can learn there essentials of the system and the empirical ap- 
proach behind it. After that, they may want to see more evidence and back- 
ground and turn to the printed presentation. People who want to know more 
about their own summarizing or who want to teach it better will possibly not 
spend much time with the computational approaches. This list might be con- 
tinued. For the author, the consequence is to organize the presentation for vari- 
able reception strategies. 
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In sections which tend to be accessed selectively, the local information must 
more than elsewhere suffice for a first understanding. Especially in the empiri- 
cal description of professional summarizing, local summaries help readers to 
understand as they change the granularity of reading, by switching from the de- 
tailed study of one item to a global view of the next. The detail of description 
goes up and down such that the reader, whatever reception path (s)he takes, 
encounters in reasonable time an example of summarizing which is explained 
in detail, while other examples are described more succinctly. 
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Fig. 1.1. Overview of Summarizing Information 



Figure 1.1 supplies some suggestions for choosing the parts of the book that 
may be most useful to the reader. People from outside cognitive science may 
find the short introduction to situated communication and cognition helpful, 
because it explains basic concepts which are not taught in all disciplines. For 
other readers. Chap. 2 may simply serve to secure the common knowledge 
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background of the discussion. Psychologists and teachers are possibly best 
served with the theories of normal summarizing in Chap. 3 and by some study 
of expert summarization in Chap. 4. Readers who dislike technical detail are 
invited to skip it. For them, the author has included more abstract descriptions 
that focus on principles of summarization processes. Those who want or need a 
precise image of summarization skills will take the time to study process sam- 
ples. Cognitive psychologists may do so out of an empirical interest, computer 
scientists may look for features that are implementable, teachers for examples 
to present to their students, and practitioners will compare their own approach 
to that of the experts who contributed to this book. 



Content overview 

The second chapter, on communication and cognition^ is dedicated to readers 
who are not familiar with cognitive science. Readers who know the basics of 
situated communication and of cognitive processing are invited to skip it. We 
discuss communicators and their structure, the computer metaphor, the ecosys- 
tem metaphor, memory and mental representations, their different forms from 
frames to MOPs, integrated representations, understanding, and the production 
of utterances. The embedding of summarizing in a communication situation is 
natural, but it is currently not much emphasized in research. Since summariza- 
tion depends on the situational context as other cognitive achievements do, 
and since we go through different types of summarization, it is useful to con- 
sider the environment of the summarization activity. Therefore each subse- 
quent chapter is introduced by a short discussion of the situational framework. 

The third chapter deals with summarizing in everyday communication. We all 
summarize very often, when reporting about the movie we saw yesterday or the 
negotiations during a meeting, recording an accident, or writing a resume of a 
stage play at school. Everyday summarizing skills belong to everybody’s com- 
munication competence. They may be further developed to professional profi- 
ciency. A journalist who reports today’s discussions in Parliament on TV must 
have more developed summarization skills than a pupil who has been con- 
fronted only a few times with a summarization task, although journalists do not 
yet qualify as summarization experts. In a communication situation, a sum- 
marizer has a summary recipient or user as partner. While we know almost 
nothing about summary users, we have interesting theories and reports about 
the summarization activity. They build upon theories of discourse processing. 
We learn about macrorules and macrostructures, the role of schemata, and 
more varied summarization strategies or operators. After that, the discussion 
turns to the central problem of summarization, namely relevance assessment 
(deciding about the importance of information units). In particular, importance 
may be judged by looking at the incoming information and its semantic struc- 
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ture, in relation to communication needs, or considering it from the viewpoint 
of the summary user. 

In the fourth chapter, professional summarizing is discussed. The forms of 
professional - expert - summarizing are more developed than everyday sum- 
marizing can ever be because experts have much more training. The presenta- 
tion focuses on the summarization skills of professional abstractors and index- 
ers. This can be explained by the research situation: among professional 
summarizers, they are the best investigated. Whereas the communication 
situation in everyday summarizing can differ within a wide range from face-to- 
face to indirect mass-media style communication and from a simple conversa- 
tion environment to heavily supported computer conferencing, the environment 
of expert summarization is always organized and technically supported, and 
normally it is computerized. The chapter begins with a review of the cognitive 
science research on abstracting and indexing. Then we report an in-depth study 
of six expert summarizers at work, comprising a general description of their 
summarization techniques and their process organization, giving examples 
from real observed working sequences with their theoretical interpretation. 
Then follows the summarizers’ intellectual toolbox which lists all 552 observed 
strategies in a functional order. This part of the book provides a grounded the- 
ory of expert summarizing. It is backed up by a computer model (the accom- 
panying SimSum system; see below). 

Much research about summarizing has been done from the perspective of 
automation. Chapter 5 therefore presents computational approaches. Here, the 
summarization situation is inevitably computerized, and in particular the role 
of the summarizer may be ascribed to a computer system. Automatic systems 
still fail to reach human cognitive performance. In spite of this, the perspective 
of automated summarizing itself is very enticing. There is an increasing need 
for summarization functions, stimulated by the growth of computerized infor- 
mation and communication systems, with an additional push from the recent 
expansion of the Internet. Computerized information is also driving summariza- 
tion researchers to consider image and sound media and thus to expand from 
text to multimedia summarizing. Until today, many systems have proposed ex- 
traction of original text passages as a feasible ersatz to summarizing. The con- 
siderations which steer the extraction are their main research objective. Other 
approaches carry out knowledge processing using semantic representations and 
cognitively plausible methods or combine techniques from different back- 
grounds, e.g., shallow morphosyntactic analysis with statistical corpus linguis- 
tic constraints and semantic knowledge from an ontology. Increasing activity in 
the field is very evident, and several new approaches can be discussed. 

The SimSum simulation system demonstrates the cognitive mechanism of ex- 
pert summarizing. It shows in four sequences how experts solve important cog- 
nitive tasks: 
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• writing the topic sentence of the abstract 

• online summarizing with instant summary production 

• indexing 

• strategic document exploration 

References are listed at the end of each chapter. I have tried to avoid heavy 
bibliographic luggage, and equipped readers instead with more immediately 
useful bibliographic hints. The cited work provides further bibliographic guid- 
ance. The worst reproach that can be made regarding this solution is that it is 
subjective. I take this risk freely, and take comfort in the absence of long lists 
that fall behind a moving field. 



About information views and quality 

The way information is presented depends at least in part on the medium. In a 
combined medium, we expect more presentational variety. For most people, 
variety is pleasant, monotony tends to be boring. As is well known, paraphrases 
of the same assertion help us to understand. For this reasons, authors repeat 
their main points in different guises. When a subject is explained via several 
media, the chance improves that in this way or that way it will become clear 
to everybody. 

Especially in complicated domains, however, it can become cumbersome to 
identify the same information under different presentational forms, or to distin- 
guish differences which are merely presentational from others that relate to 
meaning. Since good metaphors are helpful, changing metaphors is not unusual 
when explanations serve different aims. A human being can be seen as a re- 
sponsible and active entity, for instance in the face of the law, but from a cyto- 
logical viewpoint (s)he is also a sort of pond in which all sorts of cells live and 
travel following their own destinies. The cytological view is bound to specific 
media conditions, namely magnification by several thousand times. 

In Summarizing Information, units appear under different views in an analo- 
gous, but less extreme fashion. To simulate the cognitive processes of summa- 
rization, they must move from the human cognitive apparatus as environment 
to a computer, and we must visualize them. In particular the main entities, the 
strategies, must be reconstructed to make them fit for their new habitat on a 
computer screen. In their empirical presentation, strategies have a name, a 
definition, a set of users, and their place in the functional order of the toolbox 
(see Fig. 1.2). They occur in observed working steps. For visualization and im- 
plementation in a multimedia system, they are changed to object-oriented 
agents with a zoomorphic appearance, borrowing the first metaphor from the 
programming system that equips them with computational activity and the 
second from traditional fables. Both strategies and agents are good at handling 
the flexible organization of human cognition. To attentive readers the SimSum 
agents reveal themselves to be descendants of Selfridge’s cognitive demons 
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(SELF59) “who” shriek when they encounter appropriate data. Their decision 
demon selects the loudest. Since on the screen only small creatures can be 
presented three-dimensionally, we end up with cute insects such as ladybirds. 
With a mouseclick, every zoomorphic surface actor presents its explanation 
(see Fig. 1.2). Readers need some imagination to handle such related items in 
different guises: an agent such as hold is a realization of the strategy of the 
same name, but in spite of the analogy in function, the strategy and the agent 
are also different. 




Fig. 1.2. Strategies and agents 



The printed book by and large presents two different information types. Chapter 
4 is an original report on expert summarizing. This report presents finer-grained 
knowledge, describing methods and results of an investigation and demonstrat- 
ing a number of real expert summarization steps and sequences with their theo- 
retical interpretation. Any possible lack of integration in this part can be 
blamed on the author. Presentation methods are more varied in this chapter. 
Among other things, we look at real-world data and their grounded theory. The 
form of their presentation, a drawing combined with a verbal comment, has a 
long tradition, but nevertheless it may be unfamiliar to some readers. 

The remainder presents “second-hand” information, coming near to the usual 
literature survey style. However, we do not know about summarizing in the 
same way we know about onions or oak trees. The approaches to summarizing 
that parade in front of the reader are all individual. They express different opin- 
ions about their subject, accessing it from different angles. In summarization, 
research has not been steered by anything like a research program or even a 
set of common scientific convictions. The studies are sometimes ad hoc, 
heuristic, practically oriented, or guided by theories that have no relation to 
each other. 

Collecting heterogeneous knowledge in a monographic form puts it together, 
but does not amalgamate it. On the contrary, the contradictions and the lack of 
integration become even more visible. The lack of homogeneity may produce a 
feeling like stumbling from rock to rock, and sometimes into a pothole where 
investigations inspired by cognitive theories are rare. Inevitably, the reader will 
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feel the consequences when under way through the following chapters. Both 
author and reader have to accept the rugged landscape of summarization for 
the time being, before they can advance the cultivation of the field, by further 
developing this or that theory, by teaching this or that summarization tech- 
nique to their students, or by applying a newly acquired strategy in their sum- 
marizing practice. 
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2 Communication and Cognition 



2.1 Introduction 



Summarization is a situationally and communicatively bound cognitive task. In 
the following, we prepare the discussion of situations where summarization 
takes place, starting with the communication situation. We proceed from a 
classic, simple configuration, in which only two communicators appear in a 
given context. They communicate for the most part, but not exclusively, 
through language. 

What happens during summarization can be explained better if we know to 
some extent how the cognitive system, i.e., our thinking and communication 
apparatus works. Therefore, we shall analyze the communication situation and 
then turn our attention to the internal structure of communicators. In order to 
process knowledge, communicators need a memory where they organize and 
store knowledge. Moreover, they must be capable of assimilating (understand- 
ing, learning) knowledge from their environment, and they must be able to 
impart parts of their knowledge to others. Thus, three principal components of 
human communication emerge for discussion: the store of knowledge in 
memory, understanding, and the generation of utterances. Since most of what 
we know about human knowledge processing relates to linguistic knowledge, 
the discussion concentrates on this, following the approach of RICK93. Read- 
ers with a wider scope of interest may imagine the communicator, discourse 
understander, and generator as a “cognitive virtual machine” (PYLY85) and 
take discourse processing as an example for general knowledge processing. 
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2.2 Communication situations 



A situation is a framework of references taken from real-world relationships 
which is punctuated by topics and articulated by objectives and plans of action 
that are arranged concentrically and become both more anonymous and diffuse 
with increasing spatiotemporal and social distance. (HABE81) 

Communication situations are structured by the interconnected relationships of 
day-to-day life. They bring two or more communication partners together. 
Communication between them as a rule serves to master a situation: people try 
to solve day-to-day problems by communicating with each other about them. 
Even situations that are not particularly problematic produce numerous com- 
munication needs. The mapping between intellectual (“symbolic”) and prac- 
tical activities is generally not direct, but mediated through a number of inter- 
pretations. Communicative activity has first and foremost the function of regu- 
lating the practical activity and supporting intellectual activity by objectifying 
and abstracting. Like practical and intellectual activity, communicative activ- 
ity is divided into functionally interrelated modules and subtasks. It has a mo- 
tive and a goal and involves a sequence of acts and operations that serve to 
reach the goal in situationally appropriate steps. 

To illustrate these general statements about communication situations, let us 
imagine two communication partners engaged in a game of table tennis. What 
they do with bat and ball is thus a form of active communication; observing 
the opponent, his gestures and facial expressions, provides information to both 
players. Furthermore, the characteristic background noise makes it possible 
above all to follow the movements of the ball. Embedded in this overall 
situation is the verbal communication between the two players. The extent of 
this depends on situational factors. During a concentrated rally, the players will 
generally not have much energy to spare for idle chat. If, however, they have a 
difference of opinion, the game gives way to a verbal exchange. The minimum 
verbal communication during the game is a special summarization: with 
utterances like “two all” one of the players or an umpire continually 
summarizes the score whenever it changes. 

A distinctive feature of communication between people is that the meaning 
of utterances is not fixed from the outset, but must first be implicitly negoti- 
ated by the communication partners. Day-to-day communication involves many 
negotiation processes aimed at discovering the meaning attributes intended by 
the communication partners, or at defining them in the first place. Since the 
communication partners must often first agree on the topic of communication 
and on possible interpretations of the information conveyed, such negotiation 
processes imply a certain communication instability. 

The determining situational factors steer, in a general way, the behavior of 
communicators. The situation determines the communicators’ scope of action, 
presenting them with affordances (points of departure) for communicative or 
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active action (GIBS79), but also with restrictions. Since communicators can 
free themselves from the determining factors of a situation only to a limited 
extent, their communicative and cognitive activity bears the traces of these 
factors. If a report is to be handed down to posterity hewn in stone, great care 
can and will be taken in elaborating the content. If, on the other hand, a jour- 
nalist is summarizing the discussion taking place in an ongoing legislative 
session for the television news, then her or his report is intended for immediate 
audiovisual consumption and can subsequently be forgotten. Come what may, 
(s)he must have the report ready in time for the transmission, if necessary at 
the expense of quality. In other words, the communicator is constrained by 
time restrictions. Neither is it irrelevant for whom (s)he is reporting. A TV 
journalist has an audience in mind and knows, for example, that for children 
(s)he must change to a different communication style. 

In a communication situation, communicators typically assume two roles, 
that of speaker or text producer and that of listener or reader (see Fig. 2.1). 
They often take turns in these roles. Thus both participants must at least in 
principle be able to produce and perceive texts. For successful communication, 
the communication partners must have a common basis. They need a largely 
consistent knowledge about the intent of the communication. This is also re- 
ferred to as communication conventions. This knowledge about communication 
aims and standards can also result in expectations. It can, for example, in a 
more or less binding way regulate the form of the interaction, i.e., determine 
turns of speaking and listening, question and answer, request and help, greeting 
and response. The greater the shared pool of communicatively relevant knowl- 
edge, the more likely the communication is to succeed, since the communica- 
tors can fall back on this common background. 

situation 



communicator / producer 



discourse 

world 



communicator / recipient 



effector 



discourse 



effector 



r 



Fig. 2.1. Communication situation 



Situation and communication are closely interconnected. Communication goes 
hand in hand with sensory perception. In a communication situation, the sur- 
roundings of a situated communicator consist of the sum of objects that are 
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functionally related to her or him. These may be other communicators, for ex- 
ample the producer of a text for reception, but also other factors that have an 
impact on the communication, such as a noise source that makes it difficult to 
listen. The situational position of the communicator results from the sum of 
relations to the discourse world (see Fig. 2.1). A communicative situation is 
always subjective, i.e., it is specific for one communication partner. It does not 
correspond entirely to the situation of other communicators. 

Communication is tied to the principle of relevance, i.e., one communication 
partner expects the statements of the other to influence his or her cognitive 
state in the current situation (SPER87). There must be a reason why the 
speaker chooses to talk, namely to somehow influence the cognitive state of 
the recipient. This is the communicative function of a discourse, or of a com- 
munication act. Frequent functions of communication acts are to inform, warn, 
instruct, convince, entertain, or amuse. None of them can be achieved without 
the response of a recipient who accepts information or a warning, who learns, 
changes beliefs, laughs, etc. This implies the acknowledgement and the active 
reconstruction of the message by the listener, reader, or user of a discourse. 
Communication is built on interaction to a higher degree than traditional mod- 
els depict it. Frequently, these confront an active speaker with a passive lis- 
tener who acts as a consumer of information. 

What takes place during communication cannot be reduced to the processing 
of a medium - for example the English language. First, it makes no great dif- 
ference to us if a second medium - such as a video film - is introduced, since 
what we process is the conveyed knowledge and this can to a large extent be 
extracted from any suitable form of presentation. The form of presentation and 
the medium may help cognitive processing, but cognition works and frequently 
succeeds also without much presentational support. We notice that to a certain 
extent, media are exchangeable. Cognition does not depend too much on a 
specific presentation. Second, we can find all kinds of contents transmitted by 
the same medium. The English language, for instance, states both today’s 
stock exchange results and the unhappy destiny of Desdemona; a movie ex- 
plains to us the principles of welding as well as the recent adventures of Don- 
ald Duck. 

Normally, communication models do not constrain much what is communi- 
cated. They allow for any kind of speech acts or discourses. A discourse may 
be a dialogue about fishing, a lengthy conversation about the neighbor’s trav- 
eling adventures, or an exchange of speeches celebrating the anniversary of 
the French Revolution. A general communication model is also insensitive to 
the dramatic difference in cognitive effort a discourse may involve: talking 
with a neighbor is easy and needs no particular energy, while writing the Di- 
vine Comedy or the Decameron would have probably been beyond the capaci- 
ties of most readers and of the writer of this book. 

Communication happens mostly through language. Unless interactions are 
extremely simple, they use meaning aggregates as a means of expression. 
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which for the sake of simplicity we will call texts or discourses. Texts or dis- 
courses can contain subtexts. They can integrate various media (written and 
spoken language, mimicry, illustrations, etc.). A text is a unit of meaning that 
is required in order to realize a communicative act. In situations in which 
summaries can play a role, communicators are concerned with generating and 
perceiving complexes of information of this kind. 

In written communication or mass communication, for example when pub- 
lishing a book or broadcasting a TV series, many recipients are distributed in 
time and place. In such distributed communication, common background 
knowledge is just as necessary for understanding as in face-to-face communi- 
cation. A certain consensus must also exist over how to handle each other and 
the transferred knowledge. If for some reason the common background shared 
by the communicators becomes too weak, communication problems will in- 
crease. Both the producing and the receiving communicator are required to 
make a cognitive effort to ensure that communication can succeed. 



2.3 The cognitive structure of a situated communicator 



It matters what communicators have in their brains. The core problem in be- 
coming a new Dante or Boccaccio is conceiving a discourse at their level of 
competence. It is first and foremost a problem of our creative cognitive compe- 
tence as communicators. Generally and technically speaking, communicative 
action presupposes that a communicator can represent the states of the dis- 
course world. This is accomplished by memory. It stores an internal (mental) 
model of the discourse world (see Fig. 2.2). Models stored in memory enable 
us to consider a state of affairs without it being physically present. This also 
makes it possible to plan changes to states of affairs without necessarily hav- 
ing to carry them out there and then. For example, we can mentally rotate an 
object in our mind and view it from behind before actually turning it round. 

Since large amounts of knowledge are stored in a communicator’s memory, 
there must be some very efficient internal organization that explains the fast 
retrieval of knowledge items during communication. In order to accomplish this 
task, a communicator’s internal representation system must be modular and 
integrated. 

A system is modular if it consists of components that only communicate with 
each other (or with the environment) via clearly defined interfaces. The struc- 
ture of the system is determined by the components and the relations between 
them. If the structure of the system contains no gaps, the system is coherent. 
This does not mean that the individual subsystems melt into one diffuse unit. 
Rather, they may form separate modules, but at the same time they interact 
closely with each other. Since communication between human beings not only 
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requires linguistic knowledge, but also other knowledge types and various sen- 
sorimotor processes that all have to be processed more or less in parallel, what 
we are dealing with is a complex system. Indeed, it can only function if it is 
well organized. The interesting question is how it is organized. 



situation 



communicator / producer 



mental model of 
discourse world 



discourse 

world 
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mental model of 
_ discourse world 



effector 



discourse 



effector 
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Fig. 2.2. Communication with mental models 



2.3.1 The role of metaphors: The library metaphor, 
the computer metaphor and the ecosystem metaphor 

Whereas the nervous system as the physiological basis of human cognition can 
be examined relatively directly, the conditions for investigating how the active 
human cognitive system functions are less auspicious. As a rule we can 
establish hypotheses about the sources of human cognitive capacity only indi- 
rectly, by observing the performance of individual subjects. 

To gain a better understanding of what goes on in human knowledge process- 
ing, we need a means of comparison whose characteristics are known and can 
be projected onto the much less familiar human cognition. 

We may, for example, conjecture that human memory is organized like a li- 
brary. We can then examine, empirically or experimentally, whether the mem- 
ory behaves in the way the library metaphor predicts. For example, the library 
metaphor predicts that rarely used material will require especially long access 
times, since it has to be called up from a (mental) external or background 
store, whereas knowledge that is constantly used is readily available in a 
(mental) open-shelf arrangement. As long as empirical and experimental find- 
ings support them, we can apply organization principles that have been derived 
from the function of a library to human memory. 

For a broader view of human cognition we may use the computer metaphor, 
because it helps us to comprehend essential characteristics of human informa- 
tion processing. In this metaphor, communicators are cognitive systems that 
can process information. To begin with, it is irrelevant whether they are human 
beings or information-processing machines. They are systems in the sense that 
they possess two or more components that are functionally related to each 
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Other. The components are subsystems that in turn may contain other subsys- 
tems. This view of things is known as the computer metaphor of information 
processing. It obtains its knowledge-deriving value primarily from the fact that 
a computer is an artifact that is constructed right down to the last detail and 
can therefore be fully understood. It therefore allows us to form hypotheses 
about difficult aspects of human information processing that can then be em- 
pirically and experimentally tested. 

Situated communicators interact with their environment. Comparable com- 
puters can at best be imagined as information-processing robots. They can not 
only process information but also “perceive’’ their environment with cameras 
and sensors, draw attention to themselves by means of appropriate instruments 
(such as a loudspeaker), and change their environment with robot arms or other 
effectors. They have not only sensorimotor activities (e.g., pointing gestures or 
lifting objects), but also social “awareness” and social behavior. This presup- 
poses that their cognitive systems are dynamic. Conditional upon an input, the 
state of the system can change. 

The computer metaphor has its limits. For example, human communication 
is far more situation-dependent than current computer systems, and it is, 
among other things, far superior to computers in its ability to process several 
units of information concurrently (in parallel). Even if we advance from the 
general computer metaphor to that of the robot, this does not alter the fact that 
a computer is an electronic device that is very different from man in many re- 
spects. 

Hence the interaction of other lifeforms with their environment is being used 
more and more as a model in studying human information processing and de- 
signing computer systems. Lifeforms are embedded in their environment and 
are in interaction with it. The rapport between lifeform and environment is usu- 
ally known as an ecosystem. The ecosystem metaphor thus sees an informa- 
tion-processing system as being integrated in the interaction with its environ- 
ment and as resulting from it. The application of the metaphor to human be- 
havior leads to a theory of situated behavior (CLAN93, VERA93, CLAN97). 

A cognitive system must be considered in the interplay with the information 
it processes, as many characteristics of an information processor only become 
recognizable in the interaction with the information in a specific situation. 
Cognitive processes of information processing must therefore be examined in 
different situations, because they turn out differently depending on the envi- 
ronment. For a communicator with a summarizing task it is certainly not ir- 
relevant whether the recipient shares her or his view of the state of affairs in 
question or whether the summary is produced orally or in the form of a written 
text. The results and the cognitive processes differ accordingly. For research 
methodology it therefore follows that a comprehensive analysis in a natural 
environment is the best way of understanding communication processes, sum- 
marization processes included. 
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2.3.2 Systems structured by levels and modules 

By analogy with computer systems we can subdivide the system of human 
knowledge processing into various system levels. NEWE82 (see Fig. 2.3) 
shows how in a computer system the level of knowledge processing is sepa- 
rated from the hardware (the device level) by a number of intermediate system 
levels. In a similar way, human knowledge processing competence must be 
“assembled” on the physiological system (especially on the nervous system). 
The components of any one level - for example, neurons in the brain - coop- 
erate closely with each other. They provide the functionality that is required on 
the next higher processing level. Through the interaction within the individual 
levels and the cooperation over several system levels, system characteristics 
emerge that could not be explained by the possibilities given by one single 
system level. Such characteristics of human cognition are known as emergent. 
Examples of emergent behavior can be observed, for example, in the case of 
retrieval from memory. We all know from our own experience that we can re- 
call a person’s name and immediately transpose ourselves back into the situa- 
tion in which the person first appeared in our lives. This may cause our heart to 
pound, and thus influence current behavior. 



Configuration 

level 


knowledge level 


program (symbol) level 




register-transfer sublevel 


Logic level 


logic circuit level 


Physical level 


circuit level 


device level 



Fig. 2.3. Level organization of a cognitive system (according to NEWE82) 



In the same way as the knowledge processing system in Fig. 2.3 reveals a level 
organization, with the configuration level, for example, encompassing three 
sublevels, we can also presume that human cognitive systems have a multi- 
level organization. The three most important levels of the human cognitive 
system are the biological, the psychological, and the social level: 

• Cognition is always supported by a biological system. The most important 
contribution of the organism to cognition is made by the central nervous 
system, but other organs that are in contact with the environment also con- 
tribute to the emergent characteristics of life, without which cognition 
would be inconceivable. 




2.3 The cognitive structure of a situated communicator 



17 



• The human mind with all its capacities, how it develops on the basis of 
the biological system and interacts with its environment, represents the 
psychological level. 

• The interaction of human beings gives rise to social systems, whose char- 
acteristics in turn reach beyond what individual people can encompass. 

Figure 2.4 shows a situated communicator that with sensors (in the case of 
human communicators eyes, ears, and other sensory organs) and effectors (in 
the case of humans primarily articulation organs and the hands) resembles a 
robot. Through their sensory organs and effectors, robot-like communicators are 
capable of interacting with their environment. Furthermore, in the commu- 
nicative situation they refer to a discourse world which they build up together, 
i.e., to facts they can both imagine. Messages (“information’’) about these 
facts are exchanged in the form of discourse. 

Inside the system, the situated communicator has subcomponents that de- 
scribe important knowledge aggregates. In the model, they represent processing 
levels (representations), which - as the arrows indicate - both in the produc- 
tion and the reception of statements can be copied onto one another. The 
knowledge units are linked to processing routines. 

The metacognitive component (FLAV81, WEIN87) monitors the overall 
cognitive activity. Metacognition protects the person’s self-image and ability 
to function socially, i.e., the personal responsibility. It ensures that the per- 
son’s intentions are realized with recourse to the available cognitive resources. 
It plans the communicative procedure for a specific task. At least for routine 
tasks such as riding a bicycle it possesses ready-made working plans, which 
represent the competence of the individual (MILL73, HAMM90). They are 
adapted to the current requirements (opportunistic planning - ALT88). The 
metacognitive component also possesses knowledge for error management. For 
example, the communicator can react to a slip of the tongue (sensorimotor 
level) by adjusting the discourse plan (presentation level) in such a way that 
the articulation error is concealed and reinterpreted as a new expression. A 
communicator who has assumed the role of recipient can among other things 
also retrieve an information unit from the sensory memory storage if the 
interpretation did not succeed the first time round. Both functions demand a 
conscious monitoring of the communicator’s own cognitive activity. If errors 
occur, they are corrected. If for any reason the results are unsatisfactory, they 
are improved. 

General world knowledge and discourse-related factual knowledge enable us 
to build up an object model (a mental model) of the discourse and to relate it 
to the domains it refers to. If the discourse is about butterflies, knowledge 
about butterflies will be accessed together with additional knowledge that is 
needed for its conceptual embedding. This may concern other insects, such as 
dragonflies, or the living environment of butterflies. 
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Fig. 2.4. A situated communicator 



Presentation knowledge allows communicators to put their object model into a 
form suitable for external representation. Part of this presentation knowledge is 
independent of language. For example, we may imitate the flapping of a but- 
terfly to illustrate how butterflies move or we might draw a butterfly in order to 
characterize its shape. Even when speaking or writing about an object, we use 
non-language-specific presentation knowledge. This includes rules of informa- 
tion organization. Such rules propel us, for example, to subdivide longer state- 
ments in parts and to label them. They frequently hold for several media such 
as written texts and movies. Many of them are valid more or less independ- 
ently of individual languages in all cultures. 

The situated communicator’s knowledge equipment is open for additional 
knowledge. Curious readers can anticipate the discussion about summarization 
expertise by jumping to Fig. 4.17 below, comparing it to Fig. 2.4, and inte- 
grating summarization expertise in the situated communicator’s competence. 



2.3.3 Communication ability in real-world situations 

Most communication is relatively fast. Adults speak at a normal rate of (and 
conceive meaning for) 5-6 syllables per second. A listener is assumed to un- 
derstand at the same pace. Both speaker and listener monitor the situation, re- 
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spect communication and social norms, and may cope with additional activi- 
ties such as lateral thinking and driving a car. This implies an impressive 
processing capacity. Cognitive processing during communication must be or- 
ganized to ensure fluent interaction. This means first and foremost that data 
must be processed as soon as possible. Behavior must be incremental and stra- 
tegic, i.e., at any given moment every available item is processed as far as 
possible. Under these conditions, errors are inevitable because information is 
often incomplete and processing must be fast. To avoid bad results, metacogni- 
tive monitoring and error managing strategies are at work. When a solution 
turns out to be unsatisfactory, the monitoring component usually recognizes it. 
The bad solution is withdrawn and replaced by a better one. 

Flexible real-time communication is largely made possible by cognitive 
strategies. Strategies are pieces of procedural knowledge; they are seen here at 
any level of granularity. They are not distinguished from tactics. Strategies 
have qualities that make them differ both from plans and from rules: 

• A strategy states how a human agent best arrives at a particular goal. 

• Whereas a plan describes a macroaction and its goal, a strategy also in- 
cludes the means by which the goal can be achieved. 

• Whereas a rule of behavior stipulates the correct form of behavior and is 
safe, a strategy describes an efficient procedure that can also entail risks. 

• Strategies are a part of our procedural knowledge - learned, possibly habi- 
tualized and automated. When new demands are made on us, we will de- 
velop new strategies. 

• Every time a strategy is applied, it takes the concrete data, the actual 
aims of the person, and the other processing conditions into account. 
Strategies can combine different types of input data, and can also activate 
other strategies. 

• A mental or a cognitive strategy is a carefully directed and controlled 
mental process. The aim of a cognitive strategy can be global and allow a 
division into subgoals, or so elementary that one strategy alone is suffi- 
cient to deal with it. 

• On the implementation level, strategies can be imagined as production 
rules with the respective entry conditions and actions, or as agents pro- 
grammed in any suitable language. 

From the observed communication performance, KINT83 have derived func- 
tional principles that enable the cognitive processing apparatus to perform. 
They formulate a set of assumptions about human cognitive architecture with 
respect to discourse processing: 
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Cognitive assumptions 

1. Constructivist assumption. Observing an event involves the construction of a 
mental representation of the event. Understanding a story about an event or a 
videorecording of an event involves the construction of a separate mental re- 
presentation of the story or videorecording on the one hand and of the event on 
the other. The representation of the event and the representation of the story or 
videorecording are not identical. Both the person who listens to the story and 
the persons who observe the event itself will execute a very similar process 
which uses the incoming visual or linguistic data to construct a representation 
in memory. 

2. Interpretative assumption. Both the eye witness of the event and the lis- 
tener to the story of the event do not merely represent the visual and linguistic 
data but also interpret the events and the story. The representation of a text is 
produced not by a simple conversion of the input text into an internal repre- 
sentation but through its interpretation. This includes the addition of the inter- 
preter’s own knowledge. 

3. Online assumption. The meaning of input data is constructed more or less 
at the time of processing. Understanding takes place in parallel to the process- 
ing of input data, gradually, and not post hoc. Using the computer metaphor, 
we can call this the online assumption of discourse processing. 

4. Pre suppositional assumption. People who understand real events or speech 
events are able to construct a mental representation, and especially a mean- 
ingful representation, only if they have prior knowledge about such events. The 
event may be an accident. Then an understander should know something about 
street traffic to make sense out of the observed scene. To interpret a story 
about the accident correctly, the person must have some general knowledge 
about stories and the relationship to the events they tell of. In addition to this 
knowledge, the understander may have other cognitive information, such as be- 
liefs, opinions, or attitudes regarding such events in general, or motivations, 
goals, or specific tasks in the processing of these events. 

5. Strategic assumption. People are strategic in their information processing. 
In order to be as effective as possible in the construction of the mental repre- 
sentation, they flexibly make use of various kinds of information, they process 
information in several possible orders, and they cope with incomplete infor- 
mation. 

Contextual assumptions 

6. Social functionality assumption. Discourse, and hence the process of under- 
standing a discourse, is functional in its social context. It follows that discourse 
producers and understanders construct a representation not only of the 
discourse, but also of the social context, and that these two representations in- 
teract. When a story is told and understood in a process of communication, a 
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listener acquires information from the speaker. The listener will in this situa- 
tion not only attempt to construct her or his own representation of the story but 
also match her or his interpretation with a representation of the assumptions 
about what the speaker intended the listener to understand. 

7. Pragmatic assumption. Anyone who tells a story engages in a social act. 
People may want to entertain, to warn, or to assert something by storytelling. 
The form and the interpretation of the story may depend on this intended prag- 
matic function. This also means that listeners will evaluate a discourse on a 
number of points relative to the intended pragmatic functions. It may be found 
inadequate if it does not match contextual conditions or if it does not conform 
to its pragmatic aim. This may be the case if a joke is not funny or if it is told 
when a toast was more appropriate. 

8. Interactionist assumption. The interpretation of a discourse is embedded 
within an interpretation of the whole interaction process taking place between 
the discourse participants. Both the speaker and the listener will have motiva- 
tions, purposes, or intentions when engaging in interactions, and the same 
holds for the further actions to which the verbal actions are related in the same 
situation. The interactionist assumption means among other things that com- 
munication partners construct a cognitive representation of the verbal and non- 
verbal interaction. 

9. Situational assumption. In a social situation, the interlocutors usually play 
a situational role. There may be differences in location or setting, and there 
may be specific rules, conventions, or strategies governing possible interac- 
tions. In order to be able to understand a story, we have to link its pragmatic 
function to the general interactional constraints of the social situation. A story 
about an accident will be interpreted differently when told by a witness at a 
court trial related to that accident and when told by a friend during a coffee 
break. General norms, values, attitudes, and conventions about the interaction 
in a particular situation are part of the presuppositions that must be accounted 
for during discourse processing. For instance, the witness will be bound to the 
plain truth more strictly than a person in an informal situation. 



2.3.4 Memory and mental representation 

Cognitive systems store their world knowledge (the models) in memory. In the 
case of human beings, by analogy with the computer, we distinguish between a 
working memory with fast access and a long-term memory in which general 
and personal (episodic, i.e., related to episodes in life) permanent knowledge 
is stored (for more see BADD90). 

In semantic memory, people store knowledge that is built up through percep- 
tion and cognitive processing. The mental representations retain the signifi- 
cance of the objects for the individual. They are cognitive models of the ob- 
jects and events they refer to. Mental representations have an important func- 
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tion in coping with more complex objects, since they reduce complex objects 
to their most important characteristics. 

The working memory comprises all activated units at any given time. In 
working memory, new information is assimilated, organized, and linked to al- 
ready processed information from the discourse and from world knowledge. In 
order to ensure that this integration is carried out with maximum efficiency, 
only those units of previous knowledge are activated that are probably needed 
for the integration. Everything else is stored in the long-term memory, from 
where it can be reactivated as required. What is not assimilated in long-term 
memory is forgotten. 

When a given knowledge representation is translated into a different one we 
talk of information processing. The transformation may vary in degree. Infor- 
mation may simply be translated from one code into another. However, the 
transformation can also affect its structure, for example by breaking down an 
information complex into several parts or conversely by consolidating a num- 
ber of individual units into an overall complex. Through inferences, new 
knowledge can be derived from existing information. 

We must assume that knowledge complexes in memory have the character- 
istics of mental models (JOHN83). Mental models must integrate different 
types of representation, in particular including an analogical one, in order to 
store features of objects as different as smell and price. Mental models are dy- 
namic, allowing us to represent a state of affairs depending on the knowledge 
requirements. Communicators can adapt themselves to concrete situations by 
restructuring and switching the mental models linked to the situation, for ex- 
ample at the market when noticing the smell of apples, comparing it with the 
perfume of peaches, and a moment later discussing their excessive price. 

Mental models represent objects, states of affairs, sequences of events, the 
way the world is, and social and psychological actions in everyday life. They 
enable individuals to make inferences and predictions, to understand phenom- 
ena, to decide what action to take and to control its execution, and above all 
to experience events by proxy. They allow language to be used to create 
representations comparable to those deriving from direct acquaintance with the 
world. They relate words to the world by way of conception and perception. 

Mental models are in people’s minds; what they look like in detail is an 
open empirical question. We can assume, however, that a primary source of 
mental models - three-dimensional kinematic models of the world - is percep- 
tion. Mental models cannot be unique for every state of affairs, rather they 
must be constructed from (reusable) tokens arranged in a particular structure. 
This structure matches the perceived or conceived structure of the represented 
objects. Even if a state of affairs is described incompletely or if the description 
lacks precision, a single state of affairs is represented by a single mental 
model. 
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Mental models integrate a number of conceptual codes: propositional ones, 
spatial ones, imagery, motor skills encoded in the form of muscular programs, 
etc. These representations differ in their properties. For instance, wandering 
through spatial representations (“mental maps’’) in the imagination may take 
time. There must also be mappings from one code to another. 

Among the different representational forms, the propositional language of 
thought has the advantage of being directly expressible in language, and of 
having a computational side which is borrowed from first-order predicate logic. 
Propositions may be minimal, simply noting the existence of a concept 

“exists (canary)” - there is a canary 
or ascribing a feature to a concept 

“canary (color, yellow)” - the color of the canary is yellow. 

Propositions may be more complex and can be integrated into larger units of 
meaning organization such as frames and scripts. 

The following explanation concentrates on well-known units of representa- 
tion - categories, concepts, relations, propositions, frames, scripts - reaching 
the highest degree of integration with memory organization packets (MOPs). In 
comparison to the declarative forms of knowledge representation, the represen- 
tation of procedural knowledge is only briefly described. There we find two 
styles of representation: production rules and simulation programs. 



2.4.1 Concepts with categories and properties 

The human mind organizes the world of experience in categories such as per- 
sons, objects, events, actions, states, times, places, or directions. These cate- 
gories are also known to correspond to obvious types of question: who did it? 
what was it? and so on (see Table 2.1). 

The broad categories mentioned so far are open for refinement: persons can 
be further described according to their age and sex, or with respect to their na- 
tionality or profession. Objects may be physical objects of different size, mate- 
rial, surface, and so on. By subcategorizing concepts with additional questions 
(does it sing? does it fly? - see Fig. 2.5) we arrive at an ontology, a hierar- 
chical structure that relies on the is-a relation between concepts, and describes 
concepts as bundles of features. 

The choice of categories often depends on the intended use of the represen- 
tation. Where the weight of persons makes no difference, it is not useful as a 
category for their description, whereas the nationality may be seen as impor- 
tant. This occurs in a domain such as taxation, where foreigners may be sub- 
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ject to Other procedures than nationals, while the weight of the taxpayers 
makes no difference. 

Humans order concepts and real-world objects according to varying criteria 
and may obtain different classifications based on situation-specific needs. 
Whereas Fig. 2.5 reflects predominantly the biologist’s ontology, a chefs or 
housekeeper’s knowledge-organizing questions may be quite different: Is it the 
food in season? Does it go with the white wine we have? They may put 
salmon, shark, and pheasant into one class but not consider canaries at all. 
Whether a bird can sing, or is able to fly, or whether a fish can bite is irrele- 
vant too when they are in the frying pan. Properties of things and concepts are 
reorganized depending on the current situational background, i.e., concepts are 
experienced differently. In other words, they show different qualia (PUST93). 
On the whole, memory cannot be a passive store of knowledge, but reworks 
and reorganizes knowledge to fit situational needs. In spite of their shortcom- 
ings when compared to dynamic human memory, static ontologies remain use- 
ful instruments for arranging computerized knowledge bases. 



Table 2.1. Basic categories (adapted from LEVE89) 



Question 


Answer 


Category 


Who dropped the milk? 


Peter 


person 


What did you get? 


A book 


thing 


What happened? 


I lost my purse 


event 


What did Peter do? 


Switch the hght on 


action 


What was the case? 


The workers were on strike 


state 


When was the fire? 


Yesterday 


time 


Where was the flood? 


In Holland 


place 


Where did he point? 


Toward the castle 


direction 


What color was the house? 


White 


attribute 


How did he travel? 


By plane 


manner 



Many different relations may supplement the is-a links that form the backbone 
of an ontology: causal relations are common, has-property links govern the in- 
ternal semantic structure of a concept (compare the arrows in Fig. 2.5), and 
spatial relations (behind, in front of, etc.) are of evident interest in real-world 
situations. In accordance with the needs of particular domains, we may expand 
the relation set far beyond the examples mentioned. 

In human memory, the strength of relations may vary. Experimental evidence 
suggests that test subjects take different lengths of time running through rela- 
tions when retrieving concepts. Strong connections allow for faster access, as 
does an environment of primed (previously activated) concepts. In the human 
mind, not only ontologies as wholes, but also their relations are of limited sta- 
bility and adapt to the situational environment. In addition, it has been ob- 
served that central concepts of a class are recognized more speedily than mar- 
ginal ones. According to a standard example, it takes less time to decide 
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whether a sparrow is a bird than whether an ostrich is. An explanation is that 
the sparrow is a prototypical bird sharing many bird properties (it flies, 
sings,...), while the ostrich has less obvious bird-like characteristics and thus 
incurs time-intensive extra checking during categorization. Again the link sets 
used in all sorts of computerized knowledge bases and other concept lexica are 
simpler approximations to human memory structure. 



canary 




Fig. 2.5. An ontology of animals (adapted from COLL69) 



2.4.2 Propositions 

As we are able to remember not only single concepts, but also episodes from 
our personal experience, or theories that relate many knowledge items, our 
knowledge must be organized in structures that link individual concepts. 
Propositions are such a structure. They combine concepts of different catego- 
ries to characterize a more complex knowledge unit. The predicate typically 
specifies how the other concepts - the arguments - are related. 

For instance, the following proposition represents a situation: 

observe (Livia, plane) 

“Livia observes a plane.” 

Inside the proposition or predicate, the first item characterizes a process or 
state of affairs, the second one a person and the third a thing. The first item - 
the predicate - moulds the relations between the other meaning units, the ar- 
guments of the predicate. 
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The categories of experience combine in systematic ways: a thing and a place 
can combine into a state (“the Tower is in London’'), a person and an action 
combine into an event (“Li via observes a plane” or “the lady won the race”) 
and so on. While simple propositions can represent knowledge ranging from 
features (“is-blue”) to items corresponding to a kernel sentence, embedded 
propositions, or structures of propositions linked by semantic relations, can 
specify the meaning of larger semantic units up to whole texts. 

Propositions can be embedded into matrix propositions. For instance: 

see (Helen (put (Sandy, book, “onto”, shelf))) 

“Helen sees that Sandy puts a book onto the shelf.” 

know (Karen (observe (Livia (take-off (plane, Sydney ))))) 

“Karen knows that Livia observes a plane taking off for Sydney.” 

Figure 2.6 shows the representation of embedded events in the form of a tree 
structure. The different categories are indicated by uppercase letters, whereas 
their fillers, the instances or objects of observation, are written in lowercase 
letters. 



EVENT 



see 



PERSON 

Helen 



EVENT 




put 



PERSON THING PLACE 




Sandy book onto THING 



shelf 



Fig. 2.6. see (Helen (put (Sandy, book, onto, shelf))) in tree presentation 



The following example taken from KINT75 demonstrates a propositional text 
representation. The text deals with the relationship of Greeks and Romans: 
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‘The Greeks loved beautiful art. When the Romans conquered the Greeks, they 
copied them, and thus learned to create beautiful art.” 

1. (love, Greek,art) 

2. (beautiful, art) 

3. (conquer, Roman, Greek) 

4. (copy, Roman, Greek) 

5. (when, 3,4) 

6. (learn, Roman, 8) 

7. (consequence, 3, 6) 

8. (create, Roman, 2) 



Fig. 2.7. Text meaning representation 



In Fig. 2.7 propositions are used to state facts. They also serve as arguments for 
other propositions. In this way, more complex propositions are built up. We 
also find propositions that use a semantic relation as predicate and propositions 
as arguments (compare numbers 5 and 7). They state semantic relationships 
between propositions. Extending this approach, propositional representations for 
a discourse as a whole have been developed. Sets of discourse relations are 
found, for example, in HOVY93. Notational alternatives, e.g., a graphical pre- 
sentation, may be needed in order to display larger representations. In Fig. 2.7 
the indentation is such a graphical element. It renders levels of text organiza- 
tion. Statements that add further features to objects are seen as belonging to a 
lower level of meaning organization. The main proposition of the text is propo- 
sition 1, whereas the propositions 6, 7, and 8 belong to the third hierarchical 
level of text meaning. 



2.4.3 Larger meaning units: Schemata, frames, scripts, and 
memory organization packets (MOPs) 

Schemata. From a cognitive psychology point of view, the core function of 
schemata (BART32, RUME77, THOR79, BREW84, RUME84) is to organize 
information into reusable packages that comprise a fixed core and variable 
aspects. Hence, a schema for buying something in a shop would have as a rela- 
tively fixed feature the exchange of money and goods, but as variable the 
amount of money and the actual goods. Variables may be left unspecified. 
They are often filled by default values representing the best guess given the 
information available. Default values can be overridden by better knowledge. 
The use of default knowledge is frequent in commonsense reasoning: When we 
understand the utterance Linda drank her coffee, we assume that the coffee 
was hot, unless it is otherwise specified. As soon as we learn that the coffee 
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was iced that particular day, we replace the default assumption by the positive 
knowledge that it was iced. 

Schemata help in establishing meaning, i.e., in understanding, learning, and 
remembering, since they encapsulate what a subject knows about a particular 
aspect of the world. Schemata exist at all levels of abstraction and concrete- 
ness. They overlap and (re)organize a person’s world knowledge from many 
different viewpoints, thus supporting the active recognition of items that 
through some feature resemble a concept which is already known. They are the 
infrastructure of flexible thinking, because they aid multiple interpretation of 
concepts from different perspectives. Schemata also support learning from ex- 
perience because they provide several frames of existing knowledge for attach- 
ing a new item. 

Frames. What is investigated as a schema in psychological research often ap- 
pears under the term frame (MINS75) in computational approaches. A simple 
frame representation is shown in Fig. 2.8. Frames bundle a set of features under 
a common name. Every feature is assigned a labeled slot. There, values may 
be stored. Additional information may more specifically include so-called slot 
demons, i.e., procedures that are triggered if something happens with the slot, 
e.g., if a value is changed. Slot names define features or relations of the 
specified concepts. Among them, classificational relations state where a con- 
cept is located in an ontology. 



coffee-drinking-event 

is-a: consuming-hot-beverages-event 

type: process 

activity: drink 

actor: Linda 

object: coffee 



Fig. 2.8. Simple frame representation 



Figure 2.8 includes the central is-a relation. It tells us the superconcept of the 
current frame. From its superconcepts, the current concept may inherit defaults 
and other general values. For instance, the event of coffee drinking may inherit 
from a more general event frame named consuming hot beverages the default 
knowledge that coffee is normally hot when it is ingested. 

Scripts. The main virtue of schemata is to pack knowledge into suitable larger 
units. Among them, those for storing knowledge of commonly experienced so- 
cial events are called scripts (SCHA77). The well-known prime example of a 




2.4 Forms of representation 29 



script is the restaurant script (BOWE79 - see Fig. 2.9). It states the common 
cultural assumptions that guide a customer at a restaurant. Other comparable 
scripts may state how to eliminate a program bug or how to submit a proposal 
to a funding agency. 

Scripts state what we already know. They are essentially conservative and 
bound to normal events. The restaurant script, for instance, has no facilities to 
store a fiddler coming along, or to deal with a cook on strike or in love. In 
these events, however, no restaurant customer would fail to refer to a strike or 
person-in-love schema to understand what is going on. In contrast to scripts, 
humans tend to foreground the exceptional experiences in their memory. They 
flexibly construct sets of memory schemata that fit the situation, often by 
adapting schemata that rule a somewhat analogous case, for instance transfer- 
ring experiences from an airline strike to a strike of the restaurant personnel. 



name: restaurant 


roles: 


props: 


customer 


tables 


waiter 


menu 


cook 


food 


cashier 


bill 


owner 


money 




entry conditions: 


results: 


customer is hungry 


customer has less money 


customer has money 


owner has more money 




customer is not hungry 


scene 1: entering 


scene 3: eating 


customer enters restaurant 


cook gives food to waitress 


customer looks for table 


waitress brings food to customer 


customer decides where to sit 


customer eats food 


customer goes to table 




customer sits down 




scene2: ordering 


scene 4: exiting 


customer picks up menu 


waitress writes bill 


customer looks at menu 


waitress goes over to customer 


customer decides on food 


waitress gives bill to customer 


customer signals waitress 


customer gives tip to waitress 


waitress comes to table 


customer goes to cashier 


customer orders food 


customer gives money to cashier 


waitress goes to cook 


customer leaves restaurant 


waitress gives food order to cook 


cook prepares food 





Fig. 2.9. The restaurant script (adapted from BOWE79) 



A dynamic model of memory. A dynamic model of memory is proposed by 
SCHA82. There, knowledge is bundled into different sorts of structures includ- 
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ing plans, scenes, memory organization packets (MOPs), and thematic organi- 
zation points (TOPs). 

Plans comprise specific motivations and goals. Scenes represent the general 
structure within which particular actions take place, and comprise a setting, a 
goal, and the actions leading to a goal. The reader may, for example, have the 
goal of getting something cold to drink and go to the refrigerator. Each scene 
may form part of one or probably many memory organization packets (MOPs). 
This is true even for the simple refrigerator scene. As soon as the refrigerator 
door is opened, its internal organization must be included in the memory repre- 
sentation of the event: realizing that bottles - unlike carrots - are normally not 
found in the vegetable compartment eases the retrieval of a drink. A packet 
dealing with the available cold beverages will also go into the MOP: milk may 
not correspond to your taste, and the kids may have taken the last bottle of or- 
ange juice. MOPs are organized into metaMOPs on higher-level structures: 
from sad experience, we may have a standard metaMOP for dealing with un- 
fortunate choices such as the milk we do not like and the orange juice that has 
disappeared. In addition, our memory is equipped with thematic organization 
points (TOPs) representing higher-level analogies between situations that are 
different in detail but related in structure: the experienced parent may recog- 
nize the TOP family running out of stock of good x and put good x - the orange 
juice - onto the shopping list. 



2.4.4 Integrated representation 

According to everything we know, the knowledge organization in memory can- 
not be completely prestored. It is more realistic to assume that memory struc- 
tures are generated or reworked on demand in the task situation. This presup- 
poses that different memory stores or representation formats are used in an in- 
tegrated and dynamic way (KINT85, KINT88). 

The meaning of a document or text is understood in terms of prior knowl- 
edge. Practically speaking, its representation is formed by selecting, modify- 
ing, and rearranging propositional elements from general knowledge. However, 
text representations are separate structures much like episodic knowledge. 
Most situations represented by a text deal not with abstract general concepts, 
but with their concrete real-world instances. 

In Fig. 2.10 some of the interesting semantic relationships of integrated re- 
presentations are shown. First of all, representations of episodes (discourses or 
events) remain intact units. They are anchored in the general knowledge of the 
subject without being dissolved. The relations between general knowledge 
concepts and their instances in episodes are set up when the subject under- 
stands the event in terms of her or his own prior knowledge. Each episode also 
finds its place among other events which are remembered, related to them by 
contrasts and similarities. 




2.4 Forms of representation 



31 



general knowledge 




discourse representation 



Fig. 2.10. General knowledge and episodic representation (from KINT88) 



like (ag: person obj: (eat (ag: person, obj: cake))) 




positive link 

bake (ag: person, obj: bricks) • • negative link 

Fig. 2.11. Integrated representation in associative memory (adapted from KINT88) 

Memory may also be seen as an associative network. Figure 2.11 shows a 
small illustrative number of nodes in the knowledge net. These nodes are con- 
cepts or propositions. The connections between nodes may differ in strength, 
and they may be positive or negative (inhibitory). Nodes are configurated 
much like frames and propositions. They consist of a head plus a number of 
slots for arguments. The slot specifies the nature of the relation between the 
head and the argument. Slots may represent different types of arguments: at- 
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tributes, parts, cases of verbs, and so on. Concepts (or lexical nodes) may have 
associated perceptual procedures that identify certain patterns in the environ- 
ment - either the objects themselves or written or spoken words that refer to 
them, such as Mary and cake. 

Meaning is constructed in the knowledge net by relations. The immediate se- 
mantic neighbors and associates of a node constitute the core meaning of a 
concept or proposition, more distant nodes contribute less. The more semantic 
relations have been installed, the deeper a proposition has been understood, 
and the more meaningful it is to the subject. In addition to the network of se- 
mantic interconnections, the perceptual procedures link items to the outer non- 
symbolic world and thus convey meaning to them. 



2.4.5 Procedural knowledge 

Procedural knowledge encodes skills or how-to knowledge such as the cogni- 
tive activities needed in translation or the motor programs that steer articula- 
tion or make a hand pick up a screw. In addition to representation forms such 
as plans and scripts, IF - THEN - ELSE rules (also called production rules) 
are used to specify how-to knowledge. For instance, we may state a central 
knowledge item about boiling water like this: 

IF the water is boiling 
THEN turn off the heater 
ELSE add heat and wait 

Procedural knowledge may be represented by computer programs that simulate 
the intended performance, for example by device drivers that make a robot 
hand pick up a screw. 



2,5 Understanding 



2.5.1 Introduction: General assumptions about discourse processing 
and understanding 

We may describe discourse comprehension as the understanding of more and 
more complex units: first, words are understood, then clauses in which these 
words have various functions, then complex sentences, sequences of sen- 
tences, and overall text structures. The reality of discourse comprehension is, 
however, that there is a continuous feedback between understanding less com- 
plex and more complex units. Understanding the function of a word in a clause 
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will depend on the functional structure of the clause as a whole, both at the 
syntactic and at the semantic level. 

According to the concepts put forth in KINT83, during text comprehension a 
text base is constructed in the memory of the understander. It represents the 
propositions in the text and reconstructs their textual coherence, i.e., the se- 
mantic links between them. Only insofar as the text base represents the overall 
textual context has the reader understood the text. It is frequently the case that 
during a first reading of a text only some of the semantic relations between 
text units are recognized. Understanding remains local and incomplete, be- 
cause the understander is too busy with small-scale comprehension to think 
about the global importance of meaning units. However, as soon as possible, 
meaning hypotheses are formed. Incrementally and in cooperation, the cogni- 
tive strategies work through the text base and improve its state of representa- 
tion and integration. 

Macrostructures and superstructures (discourse schemata). The holistic 
semantic structure or gist of a text is also called macrostructure. It is made up 
of macropropositions which are derived from the micropropositions (normal 
propositions) of the text surface through macrostrategies (see Fig. 2.13). The 
macrostrategies summarize a group of propositions under one macroproposi- 
tion, thus constructing a shorter and more abstract version of the text. In good 
texts, many of these macropropositions appear on the text surface as topic sen- 
tences and thus facilitate understanding. Through repeated summarizing during 
understanding, we arrive at the topic sentence of the entire text, the highest 
macroproposition in the hierarchy, which usually corresponds to the title. As 
soon as all propositions of the text have been connected to the theme (or the 
thematic structure), the text has been understood. 

Schemata also apply to the form of a discourse, be it a tale or a scientific 
paper, as well as to other knowledge packages. These text type-specific sche- 
mata are sometimes called superstructures. In the early days of discourse 
structure research, schemata of folk tales were explored and stated in the form 
of story grammars using rewriting rules (RUME75). Obviously, folk tales are 
not the only text type to possess a clear superstructure (information organiza- 
tion schema or document structure). Many expository texts in science and 
technology are just as structured. As an example, an empirical news discourse 
schema is presented in Fig. 2.12. The document architectures presupposed by 
standard markup-languages such as SGML would be unthinkable without 
viable discourse structures. 

Strategies. Text comprehension involves different types of strategies. Strate- 
gies are pieces of goal-oriented behavior that try to reach their aims with ap- 
propriate means (more above in Sect. 2.3.3). Since they are bound to specific 
goals and many goals must be reached before a noteworthy text processing 
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task is accomplished, strategies must work hand in hand, considering the same 
data from different viewpoints. 



News discourse 
(weekly format) 



summary report 




background actual 

(general situation) situations 




history 




historical previous 

context events 



episode 



actual 

context 



events 




main events consequences 



Fig. 2.12. News discourse schema (adapted from KINT83) 



Text processing strategies belong to different functional classes. The most im- 
portant of them are: 

• Propositional strategies. As the name suggests, propositional strategies in- 
terpret sentences and transfer them into a propositional representation. They 
are thus the counterpart to a parsing component. 

• Strategies of local coherence. Strategies of local coherence work out how 
adjacent propositions are linked semantically. The most important condition 
for coherence is that the propositions refer to coherent events in a possible 
world. To express this, utterances frequently have the same referent. 
Conjunctions indicate the relations, or they ensue from the sequence of the 
propositions. 

• Macro strategies. Macrostrategies infer macropropositions from sequences of 
propositions. Macropropositions can be further subsumed to finally represent 
the global meaning of the text on various macrostructure levels (see Fig. 
2 . 13 ). 
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macro- 

strategies 




Fig. 2.13. Discourse processing by cooperating strategies 



• Schema strategies. Schema strategies make use of the fact that many text 
types have a conventional structure, a text type-specific superstructure as 
shown in Fig. 2.12. Language users are familiar with these superstructures. 
They use them to derive expectations which direct the understanding pro- 
cess. An understander tries as quickly as possible to form a hypothesis 
about the global text schema. Various cues allow her or him to recognize 
the text type, for instance the publication in which the text appears or an 
explicit text type stated in the title. 

• Production strategies. The production of text is just as strategic as text un- 
derstanding. Production strategies cannot be separated from comprehension 
strategies, because text understanding is mixed with production as text 
production is intermingled with comprehension (see Sect. 2.6). 

Production strategies start with a mental representation of potential text 
knowledge. From the menial representation they first of all construct a text 
plan and then deal with formulating the contents. To begin with, a global 
text plan is set up taking account of the communication situation. This 
macroplan controls the remaining production process. A macroproposition is 
developed, whose topic is taken from episodic memory. From now on 
inverted macrostrategies operate. They add details to the macroproposition, 
break general statements down into more specific ones, and analyze 
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complex actions in their parts. The semantic macrostructure of the text to 
be produced is thus already known at the start of formulation and serves as 
a global production plan for the text meaning. In realizing the text, prag- 
matic principles are observed. In general, nothing is stated that the listener 
already knows. The schematic superstructure provides the cornerstones for 
the linearization of the text meaning structure. As soon as the propositions 
of the text are available and ordered in a linear sequence, their surface 
structure can be produced. 

• Strategies of knowledge use. Strategies of knowledge use work with domain 
knowledge taken from memory, for example in order to correctly interpret 
an utterance. 



2.5.2 Understanding during reading 

A frequent mode of discourse understanding is reading comprehension. Reading 
has attracted considerable attention from researchers. Using eye movement 
data, JUST80/87 investigated the reading and understanding of texts from opti- 
cal perception to the semantic representation of sentences or short texts. They 
found the following principles of understanding processes: 

• Fixation of a word is the basic unit of eye movement during reading. 

• The reader processes a word immediately when it is perceived. This im- 
mediate interpretation is not restricted to the lexical meaning of the word, 
but includes its function in the current sentence and in the text. 

• Fixation of a word takes as long as its cognitive processing. 

Information intake and processing overlap in time. Five processing stages may 
be distinguished (see Fig. 2.15.): 

1 . Get next input 

2. Word encoding and lexical access 

3. Case role assignment 

4. Interclause integration 

5. Sentence wrap-up 

The optical impression that is provided by the eye is translated into visual fea- 
tures and deposited in the working memory. It is then translated into a meaning 
representation by activating an appropriate word meaning. From several possi- 
ble meanings one is selected immediately. Assigning to a referent also takes 
place as quickly as possible. In the next execution phase, a semantic role for 
the new word is determined in the sentence. If required, the relations between 
clauses and sentences are established. In other words, the new element is inte- 
grated into the local text microstructure. In doing this, the reader sets the new 
unit of meaning in relation to already known units and updates the text base. 
Since important parts of the meaning of a text, such as the topic, are often re- 
ferred to for linking new elements of meaning, they are processed more fre- 
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quently and deeply. Accordingly, the integration of important units of meaning 
takes longer. The wrapping-up of a sentence requires additional cognitive effort 
observable in the form of fixation times. 

THIB82 and JUST84 describe READER, a simulation model of normal read- 
ing. READER reads a 140-word text about flywheels and constructs a mental 
representation of it. The representation is sufficient for summarizing the pas- 
sages and answering questions. 




Fig. 2.14. Reading and understanding (adapted from JUST84) 



2.5.3 Understanding as knowledge acquisition from text 

We may ask how the incoming meaning units interact with and update the 
preexisting mental model of the understander. From this perspective, under- 
standing appears as knowledge acquisition - the construction or update of a 
mental model. Figure 2.15 shows the reading process working step by step 
through the input text, acquiring data units that update the understander’s 
knowledge of the domain. 

First of all, the units of the text are translated piece by piece into a proposi- 
tional representation through a semantic-syntactic analysis. This propositional 
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representation yields the material for the mental model. For text understanding, 
different kinds of prior knowledge are needed. These include linguistic knowl- 
edge, factual knowledge and inferential knowledge. Prior knowledge is organ- 
ized in the form of cognitive schemata. These schemata are activated by text 
information. They can in turn activate subordinated schemata in a descending 
order. The mental model of the discourse thus emerges as a constellation of ac- 
tivated schemata that correspond to the best interpretation of the text that the 
reader has managed to achieve. 

During its incremental construction, the temporary mental model of the text 
content is constantly being re-evaluated. It is checked for coherence, corre- 
spondence to prior knowledge, and completeness. An interaction develops be- 
tween the information offered by the text and the information requirement of 
the reader defined by gaps in the mental model. Gaps in the text information 
are closed by elaborating the content of the text where this appears necessary. 
In the case of acute differences, a temporary model may be rejected. This 
makes the understander regress in the discourse and attempt to construct a 
better interpretation. 
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Fig. 2.15. Knowledge acquisition from text (adapted from SCHN88) 
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2.6 Discourse production 



There is no strict borderline between understanding utterances and the produc- 
tion of utterances. They accompany each other. It is commonplace to say that 
every text production process is interspersed with understanding and reading 
activities. A speaker monitors and understands her or his own performance. A 
writer also checks, rereads, and interprets every single word. This happens con- 
tinually during drafting and even more predominantly during systematic revi- 
sion. If the reader or listener is at work and understanding is the main task, this 
is often supported by utterances. Children first read aloud, and it takes them 
time and exercise to manage silent reading. With comprehension problems, 
even adults return to speaking out loud, repeating the items that cause trouble 
in understanding, or they take notes to facilitate understanding. 

Figure 2.16 shows a model of a speaker that accounts for the real-world par- 
allelism of speaking and understanding by including comprehension in the 
cognitive activities of a communicator who is talking. The model consists of a 
number of processing components, such as the conceptualizer or the formula- 
tion component, each of which accepts a certain kind of input and produces a 
characteristic output, deriving processing from its central knowledge structure. 
The output of one component can become the input for another. A processing 
component may itself consist of subcomponents of varying degrees of auton- 
omy. Inside the formulator, we might find, for instance, a specialist for English 
formulation (as opposed to French formulation), within that, an even more 
specialized component for building English relative clauses. 

As described above for communicators in general, speakers need metacogni- 
tive abilities to monitor their own production. They can understand their own 
speech and monitor the preverbal stage and the not-yet-executed articulation 
plan. As speakers can monitor their planned speech, it must be representable in 
working memory. The hypothesis in Fig. 2.16 is that internal speech is ana- 
lyzed by the same speech comprehension system as overt speech. Thus speak- 
ers can detect trouble before fully articulating the troublesome element. When 
detecting serious problems with respect to the meaning or well-formedness of 
an utterance, they may decide to rerun the same preverbal message or a frag- 
ment of it, create a different or additional message, or just continue formula- 
tion without alteration, all depending on the nature of the trouble. The repair 
processes are of the same nature as what normally goes on in message con- 
struction. 

A cooperative speaker’s contributions are presumed to be relevant to the on- 
going discourse, such that the discourse as a whole remains coherent. This re- 
quires discourse bookkeeping on the part of both speakers. They have to record 
what is conveyed by both parties. Monitoring cannot be restricted to their own 
performance; indeed, the whole situation must be observed. The speaker’s in- 
ternal representation of the discourse is by nature a dynamic entity. It changes 
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with each new contribution made to the conversation, whether by the speaker 
or by another participant. The speaker’s record is more than a superficial trace 
of all utterances made, it is a structured interpretation of what happens in the 
conversation. Some aspects may be encoded deeply (in long term memory) 
whereas others are transient, i.e., kept in working memory for only short periods 
of time. 




J 


formulator 
1 




grai 

en 


Timatical 

coding 






surface 

structure 




phone 

enc 


^logical 

oding 











parsed speech 

lexicon 




phonetic 

plan 



articulator 



overt speech 





^ phonetic 
string 


audition 


1 





Fig. 2.16. A speaker model (adapted from LEVE89) 



Talking is an intentional activity. It involves conceiving an intention, selecting 
the relevant information to be expressed for the realization of this purpose, or- 
dering this information for expression, keeping track of what was said before, 
and so on. The conceptualization of a later utterance is constructed in working 
memory. Working memory contains all information currently available to the 
speaker, or everything (s)he is consciously aware of. Often, however, the 
knowledge at hand is not appropriate for the intended use. It must be reorgan- 
ized before it can be applied. Reorganizing means, for instance, tailoring an 
argument to the current audience by using an example of bravery taken from 
French history in France or taken from Spanish history in Spain. 

In the planning of a preverbal message (i.e., of the discourse meaning) we 
often distinguish two stages: macroplanning and microplanning. Macroplanning 
involves the elaboration of a certain communicative goal into a series of sub- 
goals, so that the information for each of these subgoals can be retrieved and 
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expressed. Microplanning assigns the right propositional shape to each infor- 
mation unit, as well as the informational perspective (the particular topic and 
focus) that will guide the recipient’s attention. The output of microplanning for 
each intended speech act is a preverbal message (see Figs. 2.16 and 2.17). 

Although there can be no formulation without some conceptual planning, and 
no articulation without a phonetic plan, message encoding, formulating, and 
articulating can run in parallel because the next processor can start working on 
the still-incomplete output of the current one. 

The formulator accepts fragments of messages and produces as output a pho- 
netic or articulatory plan. In other words, the formulator translates a conceptual 
structure into a linguistic structure. The translation includes grammatical en- 
coding and phonological encoding. 

The grammatical encoder consists of procedures for accessing lemmas, and 
for building syntactic structures. The lemma information is stored in the 
speaker’s mental lexicon. A lemma contains the lexical item and its meaning 
or sense, that is, the concept associated with the word. For example, a sparrow 
is a defined as a common sort of bird. The syntax of a word is also part of its 
lemma information. According to its syntax, the lemma sparrow is categorized 
as a noun. A lemma will be activated when its meaning matches part of the 
preverbal message. This makes its syntax available, which in turn will activate 
phrase and sentence building procedures. When the lemma sparrow is acti- 
vated, the noun phrase building processes are called upon. They may set up a 
noun phrase that, in addition to the head sparrow, may include an article such 
as the and prepositional phrases, clauses, and so on. 




Fig. 2.17. Representations that feed the formulator (adapted from LEVE89) 



This form is further developed during phonological encoding, which retrieves or 
builds a phonetic or articulatory plan for each lemma and for the utterance as a 
whole. This includes in particular accessing the lexical form information of a 
lemma and reading there its phonological (“sound”) shape. Prosodic features 
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of the whole output chain may be added. The phonetic or articulatory plan is 
not yet overt speech. It is an internal representation of how the planned utter- 
ance should be articulated. This end product of the formulator becomes the in- 
put for the next processing component, the articulator. 

Articulation is the execution of the phonetic plan by the muscles of the res- 
piratory, laryngeal, and supralaryngeal systems. Since the formulator is nor- 
mally somewhat ahead of articulatory execution, the phonetic plan must be 
temporarily stored in the articulatory buffer. The articulator retrieves succes- 
sive chunks of the phonetic plan and unfolds them for execution. Though the 
articulatory plan is relatively independent of context, its execution will, within 
limits, adapt to the varying circumstances of articulation. The outcome of 
articulation is overt speech. 
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3 Summarizing in Everyday Communication 



3.1 Introduction 



In the previous chapter we discussed how communicative acts and cognitive 
activities are embedded in situations. Summarizing is no less situationally de- 
termined than other communication processes, such as producing a TV film, 
writing a research report, or cross-examining a witness in court. It contrasts 
with other communication tasks simply because it produces short discourses 
(“summaries”) that restrict themselves to important information. Like commu- 
nication in general, summarization is not bound to a particular medium, but it 
deals in a specific way with knowledge transported by any appropriate medium 
or media set; a summary may be a multimedia message. 



3.1.1 The summarizing situation 

Figure 3.1 illustrates summarizing embedded in a communication situation. For 
simplicity, the summary is manipulated by only one summarizer and one 
recipient. They could also be seen as representing classes. As in other commu- 
nication situations, communicators who take part in a summary communica- 
tion process must have access to the discourse world to which the summary re- 
fers, otherwise they cannot build up their own representation of the transmitted 
information. 

Since summarizing often takes place in specific contexts, it is important to 
consider situation parameters for describing summarization. Summarizing in 
face-to-face communication differs from summarizing under mass communica- 
tion conditions. In mass communication situations, summaries are produced 
under totally different basic conditions and with different objectives, if we 
compare for example summaries in television sports coverage and in-text 
summaries in book reviews. Not only are the discourse worlds (sport and litera- 
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ture) different, but also the media (audiovisual television and written text) and 
the people involved (sports reporter and sports fans in front of the TV screen on 
the one hand and reviewer and readers of reviews on the other). 
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Fig. 3.1. Situated summarization 



3.1.2 The information to be summarized: Memory representation or 
external information, object representation and discourse representation 

The information to be summarized may either be available as a representation 
in the summarizer’s memory or come from external sources, such as a docu- 
ment. 

In the most straightforward case, summarizers retrieve the summary content 
from their own memory. They then search in their respective mental model for 
the most important pieces of information. Only these are included in the sum- 
mary and passed on to its recipients. In the summary user’s memory, a short- 
ened representation of the object emerges, because only the knowledge 
gleaned from the summary is assimilated and can activate her or his own 
background knowledge. 

Figure 3.1 presents the prototypical case where the source information is 
available as external material. Its presentation form additionally influences the 
acquisition work of the summarizer and the presentation of the summary: the 
source information may encompass difficult-to-interpret tables, it may be 
available as a sound recording, drafted in several languages or it may simply 
unfold as a scene before the summarizer’s eyes. In any case, the summarizer is 
confronted with a different assimilation or knowledge acquisition task. To 
summary users, it makes no difference where the summarizer acquires his or 
her information, provided they restrict themselves to the reception of the sum- 
mary. The source information may become important in a follow-up situation. 
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however, when users want to know in more detail and first-hand what the sum- 
mary has told them in brief. 

Independently of the representation form, the information source determines 
what is talked about in the summary. When the source information is a well- 
organized document, the information organization in the summary should con- 
form to the original presentation. In this case, it is a practical question whether 
a summarizer reconstructs the representation of the interesting subject matter 
from the source text, de-textualizes and then summarizes it, or deals with the 
content as it is presented by the source document. The advantage is that the 
author has already put the event or state of affairs into a communication-ori- 
ented document format. More often than not, this eases access both for readers 
and summarizers. If the source information is not well organized, there is a 
tension between faithfully reflecting the source and producing a well-structured 
summary. 



3.1.3 The summarizer 

Except in cases where a summary is produced in a dialogue, the summarizer is 
the responsible partner in the summarization situation. Summarizers decide 
about the content and the organization of summaries. Like communicators in 
general, they observe all the factors of the situation and consider as many of 
them as possible, including their own communicative intentions. In day-to-day 
communication, summarization happens routinely. The important point is to 
understand the gist correctly, to really select the important items from the 
point of view of the summary user, to present them in an easy-to-assimilate 
form suited to the medium, etc. 

The core requirement of summarization is to concentrate on the important 
points. At first this sounds simple and obvious, but it is not. The problem is to 
decide what is so important that it should be included in a summary. Hence 
relevance assessment becomes a core issue of summarizing. Assessing the 
relevance of a statement in a given context requires situation-specific knowl- 
edge. However, the situation in which a summary is to be used is often not 
precisely known. Thus, the summarizer must as far as possible weigh all the 
arguments that can be put forward for deciding about the importance of a spe- 
cific piece of information, mixing speculation with rules of thumb and definite 
knowledge. 

Like authoring in general, summarization may be more, or less, demanding. 
Summaries are often embedded in everyday conversations, as when a speaker 
reports on yesterday’s TV movie or the results of a meeting. No special effort is 
required for this style of summarizing. Nevertheless, summarizing is an intel- 
ligent skill, roughly comparable to translation. It does not belong to the ele- 
mentary cognitive activities such as seeing, controlling the speech organs, or 
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memory. On the contrary, these must be presupposed before a “higher"’ cogni- 
tive skill like summarizing can be developed. 

When summarizing becomes a task in its ov^n right instead of a natural ac- 
tivity in day-to-day communication, it may make high professional and com- 
municative demands on the summarizer. Where this is the case, the qualifica- 
tions and invested effort of the summarizers affect the result: highly qualified 
people produce better summaries than, for example, children, who first have to 
learn how to summarize; expertise in the respective field has a positive effect; 
a summary in which a lot of mental effort has been invested will turn out better 
than one that is less well thought out, etc. 

As we move towards written and professional summaries, summarization 
grows more sophisticated and particular features become visible. For instance, 
there are principles of summary formulation. A professional knows and system- 
atically applies them, whereas a casual summarizer does not. Communicators 
who summarize at a professional level of competence need more knowledge 
processing and formulation skills than ordinary interlocutors. A summarizer is 
like a playwright, a bilingual person, or a cartoonist, all of whom possess, 
among other things, more formulation skills than normal interlocutors in every- 
day conversation. In order to represent more professional communicators, the 
general situated communicator of Fig. 2.4 must be packed with additional 
knowledge. 



3.1.4 The target group: The users of summaries 

Summary users are often a difficult target group. As emphasized above, it is 
inadequate to regard them as passive consumers of information. Often enough, 
however, they are only partially known and they are not homogeneous either. 
Since summaries are frequently utility texts, their functionality depends on 
their ability to actually reach their recipients and inform them or help them to 
solve problems. There is no global definition of summaries that are suited to 
children, managers, non-experts, or other groups of people or individuals in 
specific use situations such as preparing a lecture, deciding on a new company 
location, telling a story, or watching TV on a Sunday afternoon. Summarizers 
will, however, tailor their summaries as far as possible to the needs of the tar- 
get audience. They need a partner model of the summary users in order to gear 
the summary to their requirements, similar to the way producers of a TV series 
must take the needs of their audience into consideration. 

Communication problems between summarizer and summary user often re- 
sult from a lack of common knowledge. They occur mostly in communication 
situations of the mass communication type, and are felt by the recipient. For 
example, (s)he may not be familiar with the terminology used in the summary. 
Less easy to recognize, but also quite common, are problems arising because 
recipients are not familiar with the discourse type summary or the conditions of 
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its production. In such cases, a reader expects a summary to explain a topic to 
the same extent as the original text, although this is impossible because of the 
discourse type-specific brevity of the summary. Other problems emerge on the 
part of the discourse producer. Frequent are content-related inadequacies and 
insufficient adaptation to the use situation. 



3.2 The process of summarization 



Summarization is the skilled reduction of an information object to its most im- 
portant points. The basic idea of the summarizing process is to take a body of 
information and reduce its size and content to the important points (ALTE91). 
What is summarized is information from any appropriate representation. 
Mostly, but not always, the source representation is external (not in memory). 
Most frequently, the summary is uttered, be it orally or as a short written text. 
Thus summarizing typically includes the following three main subtasks: 

• analyze the input information 

• perform the core summarization task (condensation, abstraction) 

• represent the results in an appropriate form (information presentation). 

This basic organization of the process can be seen in Fig. 3.2. The drawing em- 
phasizes representations. Each subtask takes up a source representation and 
produces a target representation. Every subtask derives its output from the input 
representation. In our minds, representations such as in Fig. 3.2 may well not 
be that separated, but rather dynamically activated parts of one comprehensive 
memory representation. We use the modular view in Fig. 3.2 all the same, 
because it helps us to unravel what happens. 




Fig. 3.2. The summarization process and its subtasks 
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The representation suggests that a summarizing process goes through succes- 
sive stages of understanding, meaning reduction, and presentation. This image 
of the process is didactic because it is easy to grasp and conveys correct in- 
formation, but it is too simple to account for cognitive reality. We use it to 
support our own understanding while bearing in mind that the ultimate truth 
may be much more complicated. In particular, the subtasks may be performed 
in a form of integration and temporal sequence that respects external working 
conditions such as the availability of material, or mental events such as an ex- 
cellent idea springing to mind. 

• The analysis of input information can involve radically different amounts 
of effort. In the case of the evaluation of a complex series of experiments, 
it may occupy the working capacity of a whole department for a long time, 
let us say a year. In the case of a young girl orally recalling what she 
knows about Little Red Riding Hood in response to her kindergarten 
teacher’s questions, the process may take only minutes and require no ex- 
ternal input, because the girl knows the fairytale by heart. 

• The subtask of condensation and meaning reduction may also differ, 
among other things with respect to the competence needed and the goals 
involved. The little girl just recalls from memory what she remembers as 
salient points, possibly concentrating on the bad wolf. Her principle of 
summarization is simple, her intellectual tools also, because she does no 
more than tell what she remembers. For an example at the opposite end of 
the spectrum, let us assume that a research team is busy crystalizing new 
knowledge from the results of a complex series of experiments. They use 
much more sophisticated summarizing and relevance principles, because 
they summarize above all according to the innovation value of their re- 
sults. This presupposes a solid grounding in the field. 

• The text production subtask is simple in the case of a girl recalling Little 
Red Riding Hood. She merely utters a spoken discourse. This may be hard 
for a child, but it is very little in comparison to the effort required on the 
part of the research team that presents its results. Here we imagine reports 
to be written, graphics to be drawn, presentations to be prepared, etc. 

The condensation and reduction process is the specific core subtask of summa- 
rization. This does not apply in the same fashion to the understanding and in- 
terpretation of the source information and to the production of the external 
summary. Task-oriented analysis of input data is a standard activity of informa- 
tion and text processing. The same is true for discourse production. As long as 
we exclude professional summarizing, we can safely treat understanding and 
presentation as prologue and epilogue of summarizing, supposing that the 
normal comprehension and presentation skills are available and suffice: what 
the understanding process delivers is assumed to be appropriate for summariza- 
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tion, and a summary is considered as a short discourse which poses no particu- 
lar presentation problems. 

In the literature the difference between meaning reduction and meaning con- 
densation is unclear. A summary indeed conveys condensed information, i.e., 
more information than a normal document of the same size, because it avoids 
redundancy, its formulation is more concise, it concentrates on heavyweight 
information, and so on. Furthermore, information may be explicit and implicit 
in a text. An active recipient can recover what is implicit by reconstructing 
and elaborating the explicit text information, and adding his or her own knowl- 
edge. If this is true, we can transmit information implicitly, almost without a 
word. In a borderline case, a summary may simply mention a method (such as 
KADS) or a system (e.g., the ICONCLASS), and experienced scientists can 
add from their own resources information to fill monographs or even library 
shelves. 

The question is whether the single words from the summary have really 
transmitted and activated all this information or not. In the first case, names 
such as KADS or ICONCLASS provide extremely condensed information. If we 
choose the second possibility and decide that an intelligent summary user 
imports the bulk of the knowledge while the summary provides only the cue- 
words, there is no particular condensation effect in the summary. Until we have 
better evidence, we shall suppose that the skilled meaning reduction during 
summarizing is centrally a deletion of less important (less relevant) informa- 
tion, although some condensation is also achieved. By and large, information 
given in a summary is very limited, but it is certainly important information. 

Researchers have examined summarization primarily in relation to the 
source information and its meaning structure. Consequently, research work fre- 
quently establishes what characteristics of the source text best explain why 
test subjects remember certain statements when reproducing a text and forget 
others, i.e., do not consider them sufficiently important. Some approaches go 
beyond the text-related argumentation and take into consideration whether a 
statement is also ecologically important, i.e., is experienced by the recipient 
as important in a natural environment. Others discuss more directly how the 
summarization process is executed, providing strategies or operators that ex- 
plain how information is abridged to its most relevant part. 



3.3 What we know about summarizing in everyday life 



In the following, we concentrate on everyday summarization by laypersons in 
academic life. We consider students from the very early grades up to university 
postgraduates summarizing in classroom or laboratory situations. Mostly, 
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source texts are short narratives, sometimes also short expositions from high 
school or college-level textbooks. 

We focus on explaining how the source representation is changed (reduced) 
in order to obtain a summary representation. We expect regularly observable 
methods, operators, thinking strategies or the like with which communicators 
manage their meaning reduction task. These operators or other pieces of intel- 
lectual know-how explain how summarization works; they define, so to speak, 
the mechanics of summarizing. Readers will notice that sometimes heteroge- 
neous approaches are reviewed. 

In order to reduce meaning in the right way, the reduction strategies must 
have access to judgements about the importance of meaning items. Importance 
(or relevance) assessments may rely on very different arguments taking into 
consideration the input discourse, the summarizer, the audience of the 
summary, and the situation everybody is involved in. Relevance assessment 
turns out to be the core function of summarizing. 

Research has concentrated on deriving importance from features of the dis- 
course, in particular from the plan of the events or facts reported (by finding 
out the causal network, or the overall pattern of action-based stories), or from 
the genre-typical discourse structure. The contribution of the summarizer and 
the summary user in a concrete situation has received much less emphasis. 
Nevertheless, human beings judge importance under the influence of the cur- 
rent context and the constraints on their cognitive and affective capacities. 



3.3.1 Understanding and summarizing 

Information-reducing techniques in summarizing are closely related to the cog- 
nitive processes of discourse understanding. The central intellectual activity in 
summarizing is the reduction of a representation. It presupposes the construc- 
tion of such a representation, i.e., that some subject matter has been under- 
stood at least in part, possibly by interpreting a discourse. Since people can 
only retain limited amounts of information in their working memory at any one 
time, they must, even in the case of normal discourse understanding, fall back 
on summarizing the partial representations, i.e., reproducing them in the work- 
ing memory in a shortened representation which provides access to the impor- 
tant points. 

The general features of cognitive strategies were explained in Sect. 2.3.3. 
The strategies for discourse processing were discussed in Sect. 2.5. They in- 
clude macrostrategies that are the core operators of summarization, according 
to KINT83. Macrostrategies are purpose-oriented processes of information re- 
duction and organization that build higher-level semantic structures (macro- 
structures). As instruments of normal comprehension, they help to absorb large 
sequences of complex semantic structures such as stories, natural scenes or 
actions, or expositions, by deriving higher level concepts (the macro- 
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propositions). A few high-level concepts are much easier to manage and to 
keep in memory than vast amounts of semantic detail learned from a text. 
Once inferred in a bottom-up manner, a hypothesized macroproposition may be 
used as a top-down device to understand subsequent statements, which in turn 
also provide a check on the correctness of the hypothesis. 

When discourse processing is to yield a summary, macrostrategies serve as 
goal-setting strategies. However, macrostrategies that abridge a discourse effi- 
ciently in a meaningful way need help from all sorts of other discourse process- 
ing strategies (see Sect. 2.5): 

Obviously, the function of organized world knowledge is crucial in the infer- 
ence of macropropositions. Strategies of knowledge use access the individual’s 
knowledge store and thus make sure that meaning reduction takes the right 
direction. States, actions, or events known from a cognitive schema (a frame 
or script) are easier to grasp because knowledge from the schema can be 
exploited, and they can also be reduced more easily under the guidance of the 
schema. 

Syntactic signals often emphasize elements that are locally and perhaps 
globally important. They allow syntactic strategies to mark important elements 
of a discourse. Topic-comment structure may give a hint, connectives like but, 
however, because, for example allow us to guess which of the two units they 
link may be more important for constructing a macroproposition. These indica- 
tors from the textual surface help macrostrategies to focus on the right input 
items. 

Stylistic and rhetorical strategies are useful as well in choosing input for 
macrostrategies. Often, macropropositions appear on the text surface as topical 
expressions. They tend to be marked by well-known rhetorical devices (“the 
most pressing thing to do was”, “I conclude that”, etc.) and other stylistic 
means which the author applies to guide the reader’s understanding, offering 
safe author-made macropropositions instead of leaving the reader to fend for 
her- or himself in search of the gist. The rhetorical strategies can note the 
author’s emphasis on important discourse passages and recommend these pas- 
sages for positive selection by a macrostrategy. 

Schematic or superstructural strategies may also help with the derivation of 
macropropositions, because they tell the summarizer what information contrib- 
utes to the semantic core of the concept, event or text type. Many objects 
(e.g., apartments and computers) have well-known structures that may guide 
description. The same is true for events whose subevents are known. Genres 
(discourse types), such as narratives, have schematic structures that define a 
normal ordering of categories and that set global semantic constraints for them. 
If, indeed, the beginning of a story is a formed by a setting, then we anticipate 
that the first macroproposition(s) may introduce the participants and specify 
the place and time of the events to come. 

Different situational strategies may derive guesses from the roles of persons, 
from their interests, their cultural background, etc., or they may use scripts of 
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well-known situations (a wedding, a visit to a restaurant, a parliamentary de- 
bate, an argument at home, etc.) in order to direct interpretation and reduction. 
Discourses support the communicative, social, and cultural goals of groups and 
individuals. The topics of the discourses reflect their functions both semanti- 
cally and pragmatically. An understander and summarizer will usually possess 
some representation of the current situation. We therefore expect her or him to 
add situational knowledge to the understanding and macrostructure building 
process, using situational strategies. 

Our consideration leaves us with classes of strategies that may contribute to 
the meaning organization and reduction work of macrostrategies. They bring 
into play different sorts of background knowledge. This is an impressive image, 
but it does not tell us much about how to achieve summarization, as 
summarizers or programmers of summarization functions. Therefore we return 
to the seed idea that three reduction operators (macrorules) are responsible for 
summarization, and we discuss these operators, keeping in mind that they have 
in the meantime been expanded into the more general and more flexible, but 
also less well-defined macrostrategies. 

Macrorules as basic instruments of information reduction. To understand 
how macrorules work, we proceed from the representation of the discourse 
meaning in memory (the result of understanding) and assume a knowledge 
base in propositional representation. It can be reduced by inferences, in 
particular by reduction operators and strategies (macrorules and ma- 
crostrategies as proposed in KINT83). They pick up a sequence of propositions 
from the representation of the incoming discourse, and reduce it to a simpler 
representation with less information, a macroproposition. In the relevant 
literature, definitions of macrorules and their number vary slightly (see 
SHER89 for alternative formulations). KINT83 propose three rules defined as 
follows: 

1. Deletion. Given a sequence of propositions, delete each proposition that is 
not an interpretation condition (e.g., a presupposition) for another proposition 
in the sequence. 

Example: 

We went to the bookstore. It was at the comer. We bought a dictionary. 

After deletion: 

We went to the bookstore. We bought a dictionary. 

2. Generalization. Given a sequence of propositions, substitute the sequence by 
a proposition that is entailed by each of the propositions of the sequence. 

Example: 

Father was washing dishes. Mother was working on her new book. The daugh- 
ter was busy painting the window frames. (This example is borrowed from 

SCHN81.) 

After generalization: 

The whole family was busy. 
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3. Construction. Given a sequence of propositions, replace it by a proposition 
that is entailed by the joint set of propositions of the sequence. 

Example: 

John went to the station. He bought a ticket. Then, he took the train to Lon- 
don. At London Paddington he left it. 

After construction: 

John took the train to London Paddington. 

Figure 3.3 shows a meaning representation of the short family status report 
used as an example for the generalization rule. At the same time it gives a re- 
presentation of the reported state of affairs or subject matter. The possible re- 
ductions have been entered as relations between concepts, as well as other 
conceptual relations. 



(are-busy- with, family, any-object) 




relations (is-new, book) macroniles at work 



relations (is-new, book) macroniles at work 

qualification deletion 

generic generalization 

part-whole construction 



Fig. 3.3. Relations and macrorules structuring a family status report 



At the bottom of Fig. 3.3, we find the propositions that represent the meaning 
of the surface body text. From every proposition, we learn something about the 
situation in the family. All propositions are subsumed by a macroproposition 
that can be inferred from the propositions of the textual surface. Macroproposi- 
tions bundle several propositions and state the common meaning of all sub- 
sumed propositions. The macroproposition is therefore more abstract in its 
meaning, and it tells us the core meaning of a text unit, normally a paragraph. 
Each concept of the macroproposition is derived from the respective concepts 
of the basic propositions by a macrorule. For instance, the macrorule generali- 
zation abstracts the superconcept busy-with from the specific activity concepts 
wash, write, and paint. The macrorule deletion eliminates the proposition (is- 
new, book). It is allowed to do so because this quality of the book is not re- 
ferred to in the discourse. The construction macrorule puts the family concept 
together, integrating the family members father, mother, and daughter and re- 
places the individuals by the name of the group. The construction macrorule is 




56 



3 Summarizing in Everyday Communication 



knowledge-based as are other reductive inferences in summarization. To de- 
cide whether the mentioned individuals make up a family (as opposed, for in- 
stance, to a sailboat crew), it needs a knowledge structure (a schema) that 
stores a description of the family concept. If the family concept were defined 
otherwise, as possibly in an African culture, the construction rule would yield 
different results. 

Macrorules are recursive. They may apply to a sequence of macroproposi- 
tions and thus derive a higher level of macrostructure. Thus, a macroproposi- 
tion may in turn be subsumed by a higher order macroproposition which states 
the kernel meaning of a larger information complex (see Fig. 3.4 and also Fig. 
2.13). The uppermost macropropositions represent the gist of the whole dis- 
course. The root macroproposition represents the discourse theme. It normally 
corresponds to the meaning of a title. Since a summary is short and should 
contain the semantic core of the original discourse, the summary is represented 
by the macropropositions near the top of the hierarchy. A written or spoken 
summary is a verbalization of these macropropositions. 

The macrorules work upon propositional meaning representations without 
specific restrictions. They accept a sequence of propositions and return a 
macroproposition that is entailed by the input propositions. Macrorules may 
operate both on the text base (a discourse representation) and on the situation 
model (a representation of events or states) activated by the discourse in the 
mind of the understander (KINT83). 

Macrorules as described here are theoretical devices. They cannot operate 
mechanically. Only because the understanders and summarizers apply them 
correctly under the guidance of their knowledge can they work successfully. 
This becomes very clear if we expand the above-cited report about the busy 
family and after that, summarize it again. The family situation report now says: 

Father was washing dishes. Mother was working on her new book. The daugh- 
ter was busy painting the window frames. All of a sudden, the publisher called 
in and told mother that he needed the manuscript a month earlier than foreseen. 
Father left the dishes and finished mother’s drawings instead. The daughter 
dropped the brush and rushed to do the proofreading. Supported by her family, 
mother managed to finish her book in time. 

As one finds quite often, the last statement of our extended example conveys 
an in-text summary: 

Supported by her family, mother managed to finish her book in time. 

In this version, dishwashing and window painting appear as background activi- 
ties without real influence on the outcome. Therefore, they can be deleted. We 
note that the deletion macrorule is applied under the guidance of factual 
knowledge. In particular it is useful to know the point of the whole story before 
deciding about the treatment of an individual statement. This knowledge is not 
yet available when we learn about the dishwashing father, neither was it avail- 
able when the first summary above was derived. If we assume that an under- 
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Slander summarizes the extended family situation report concurrently to sup- 
port his or her own understanding, then the first hypothetic summary “The 
whole family is busy” may be inferred, but it is abandoned as soon as the im- 
pact of the main event, namely the situational change due to the publisher’s 
call, is grasped. 

Again, factual knowledge is needed to conclude that father, mother, and 
daughter qualify as a family, and then to aggregate them to a family by the 
construction rule. When inferring that doing the drawings and proofreading are 
good ways to support an author, it is clear that the reader’s and summarizer’s 
own background knowledge is called on. Only with some knowledge about au- 
thoring can we apply the generalization rule. 




Fig. 3.4. Forming the macro structure (adapted from K1NT83) 



On second thoughts and after the discussion above, we conclude that macro- 
rules correspond to reductive knowledge-driven inferences (see KINT94). The 
obvious next question is how they integrate with other cognitive activities and 
knowledge structures in our memory during the process of knowledge-driven 
summarization. 

Knowledge-driven summarization using schemata and operators. To set 

up their integrative model of summarizing SCHN81 studied the summarizing of 
instructional texts by 10 student test subjects. As in many experimental 
studies, summarizing is considered here as a natural function of the memory. 
The summary shows what the test subjects have remembered from the text. In 
reaction to features of the text, memory picks up or ignores individual meaning 
elements. Summarizers can only reproduce such knowledge in a summary that 
they have integrated into their memories. Reproduction in the form of a renar- 
ration or summary, in turn, is not simply a retrieval from some storage, but a 
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reconstruction on the basis of the existing representation. Consequently, a 
shortened reproduction - a summary - is not only dependent on the original 
text but is also determined by the processing activity of the summarizer. This 
constructivist approach to memory contributes to asking and answering the 
question of how a summary is constructed from input information. 

Macro-operators and other inferences. SCHN81 asked what processes actu- 
ally take place in summarization. They distinguish two classes of processes 
(see Figs. 3.5 and 3.6): 

• horizontal processes rework meaning units by integrating the under- 
stander’s own knowledge 

• vertical processes change (interpret, reduce, or expand) incoming or out- 
going knowledge representations. 

The horizontal processes are inferences that add knowledge from the reader’s 
store to meaning units derived from input text. Three types were observed: in- 
tended inferences, elaborations, and restructuring acts. Intended inferences ex- 
plicitly state units of meaning that the author left out on the justified assump- 
tion that the reader would infer them. Elaborations add new cognitive repre- 
sentations to the text. In contrast with intended inferences, elaborations are 
neither given by the text itself nor intended by the author. In the restructuring, 
units available in the original text are linked together in another form. The 
horizontal processes are not specific for summarization but occur normally dur- 
ing text understanding. 

Among the vertical processes, SCHN81 more or less find the macrorules of 
KINT83. Deletion appears as the simplest macro-operation, although it implies 
possibly complicated decisions. Indeed several reasons for deleting elements 
were observed: items were deleted because they did not contribute to text co- 
herence or to the interpretation of other statements, possibly also because 
readers were assumed to know their content anyway. Generalization / abstrac- 
tion replaces one or several meaning units by a more general and abstract one 
which is implied by them. The inverse macro-operations correspond to the pro- 
duction strategies of KINT83 (see Sect. 2.5.1). 

In the course of the experiment, the test subjects processed the source text 
several times by reading and summarizing. In the initial phase of the learning 
process, the horizontal semantic processes predominate, as the test subjects 
are not advanced enough in their processing of the text to master their actual 
task, namely reducing the text to its important points. Macro-operators can in 
fact be applied to the already established semantic substructures in the initial 
phase of the learning process. However, as long as the reader is still concerned 
with understanding the content of the text, in other words as long as the mi- 
crolevel of the text is still flexible, the meaning assignments which concern 
the macrolevels of the text organization can be no more than temporary. The 
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reader cannot integrate subordinated meaning units into the macrostructure un- 
til he or she has established a stable and coherent overall structure. 

The role of the macrostructure. The macrostructure evidently also comes 
into play in the selection of meaning units for the summary. SCHN81 used as 
test material an introduction to the basic concepts of symbolic interactionism 
divided into the textual components introduction, fundamental anthropological 
assumptions of symbolic interactionism, definition of symbolic interactionism, 
methodical approach of symbolic interactionism, and differentiation from other 
lines of research. 

With slight deviations, this macrostructure would also be suitable for intro- 
ductions to other fields of knowledge. It is characterized at least as much by its 
communicative and didactic function as by the content it presents. It is a good 
example of how the macrostructure of a discourse is dependent not only on the 
subject but also on its discourse-specific presentation form (the discourse type) 
and its communicative function. Both the discourse schema (the macrostruc- 
ture or superstructure of the text) and the global structure of the subject matter 
(its cognitive schema) support the summarization process by providing prior 
knowledge about the meaning assignments to be achieved. They determine in 
particular about which points information from the discourse is required. 

The authors asked their test subjects to produce two summaries of different 
lengths. In order to find out something about the effect of the text macrostruc- 
ture, they divided statements from the original which appeared in one of the 
two summaries into two categories: above the medium frequency and below 
the medium frequency. The more frequently selected meaning units included 
all the meaning units from the introduction and the section on basic anthropo- 
logical assumptions, 75% of the general statements about the definition of 
meaning, all general statements concerning the methodical approach, and one 
general statement concerning the differentiation from other lines of research. 
Of the less frequently selected meaning units, 83% comprised subordinate 
detail information about the methods of sociologists and psychologists or infor- 
mation from examples. 

Evidently, the test subjects took the text schema into consideration in select- 
ing meaning units for their summaries. The data suggest that the “average 
macrostructure” in the mental representation of the test subjects encompassed 
all the text components and was restricted to the general statements from the 
original text. In comparison with the original text, the relative weight of the 
text components is shifted, the introductory meaning units assuming particular 
importance. These findings can be interpreted as evidence of the selective 
function of text schemata in semantic processing and of a situation-related 
flexibility of the schemata. The adaptation to the task in hand can be seen 
among other things in the proportion of examples in the particularly concise 
second summary and the subsequent free reproduction of the text: the second 
summary contains only 1% examples, the longer free reproduction 21%. Since 
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the free reproduction was produced after the summary, this effect cannot be put 
down to the summarizers forgetting the examples. 

Summarizing by memory encoding and decoding. The comprehensive 
summarization model developed by SCHN81 (see Figs. 3.5 and 3.6) is divided 
into two symmetrical parts. One side describes the encoding of new material 
into memory, i.e., the text understanding, the other the decoding, i.e., the 
linguistic reproduction of the summary from knowledge schemata. 

The summarizing of texts is presented as a construction process with as- 
cending (text-guided) encoding and descending (schema-guided) decoding. 
During encoding, the meaning structures are formed which represent the con- 
tent of the text in the reader’s memory. During decoding, the meaning struc- 
tures are reactivated and translated into an oral or written summary. Decoding 
is therefore to a large extent a reversal of encoding. 

The summarizing model of SCHN81 applies concepts of knowledge and text 
processing which have been discussed above. Knowledge is represented in the 
form of schemata (frames - MINS75), which may differ in their degree of acti- 
vation. An activated schema enables understanding, because it provides the 
knowledge for the interpretation of the newly assimilated information. It also 
expresses itself in specific expectations (see also Sect. 2.4.3). They may trig- 
ger an expectation-driven search for information that closes the gaps in the 
schema. Processing a knowledge element may be assigned to several sche- 
mata. Apart from diverse schemata for areas of reality (family, traveling by 
train, etc.), we must also assume schemata for letters of the alphabet, word 
schemata, syntactic schemata, and discourse schemata in order to explain the 
processing of discourses. Discourse schemata (see Sect. 2.5.1) represent the 
typical structure of a conventionalized text, for example of a sonnet or a user 
manual. 

On the encoding side of the model (see Fig. 3.5), it can be seen that first of 
all subsemantic processes, from the recognition of letters to a syntactic-se- 
mantic analysis, are worked through in order to build up initial meaning struc- 
tures. After this, semantic processes can be activated. Horizontal processes add 
elements to the meaning structure that do not directly derive from the text, but 
are inferred with the help of cognitive schemata. Most noteworthy among them 
are elaborations, inferences that close meaning gaps in the input by drawing 
upon knowledge from the understander’s own stock. Vertically operating 
macro-operators turn hierarchically lower units of meaning into higher, more 
global units, which reproduce the content in a more abstract fashion. Here, we 
find the above-mentioned reductive inferences at work. In their definition, they 
differ slightly from the version cited above from KINT83. In particular, we find 
a positive selection operator which is the opposite of the deletion rule. The 
concentration operator picks meaning items that may be dispersed through a 
text and assembles them in one proposition, thus building a macroproposition. 
SCHN81 include it on the basis of experimental findings. As in the KINT83 
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model, reductive inferences can be used more than once in order to produce 
increasingly terser representations. In forming a hierarchically higher meaning 
structure, horizontal processes can again be activated in addition to the macro- 
operations. Lastly, meaning units which have not been deleted by the macro- 
operators are stored in the appropriate set of knowledge schemata in memory. 
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Fig. 3.5. Memory encoding - setting up a discourse representation (adapted from 
SCHN81) 
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As a result of the interlinking of ascending and descending processes, the dif- 
ferent processing levels are interactive: through an information element that 
corresponds to a part of a schema, the schema is activated in an ascending 
manner. This awakens expectations with respect to further text information. 
They are verified on the text in a descending manner or included in the mean- 
ing representation as elaborations. All individual processes are potentially car- 
ried out parallel to each other. 




Fig. 3.6. Reconstructing a discourse from memory representation (from SCHN81) 



On all levels of processing, cognitive schemata provide the necessary fact 
knowledge. In the case of subsemantic processes, the character segments have 
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already been interpreted with the help of schemata. Cognitive text schemata 
control the entire construction process through expectations with regard to the 
discourse structure. Understanding and reduction end when the knowledge 
learnt from the text has been integrated in the reader’s knowledge schemata 
and thus stored in memory. 

On the decoding side of the model (see Fig. 3.6), as on the encoding side, 
ascending and descending processes are interlinked and controlled by cogni- 
tive schemata. Here, the units of meaning formed during encoding are reacti- 
vated and translated into a linguistic summary. To this end, representations 
from all processing levels can be used. However it is assumed that surface fea- 
tures like the concrete formulation of a sentence are usually forgotten as soon 
as a schema has been formed. Thus, when it comes to decoding, only parts of 
the meaning structure are readily available. From these, a coherent text must 
be reconstructed. Since the higher meaning structures are more completely pre- 
served than the lower ones, “inverse macro-operators” must reconstruct deleted 
meaning items, especially on the lower representation levels. With the help of 
cognitive schemata, the thinned-down representation preserved in the higher 
levels of the macrostructure is again filled out with detail knowledge. It is then 
re-verbalized. This reconstruction process shows up through the lexical varia- 
tions or paraphasing with respect to the source text. The most visible effect of 
semantic reconstruction is a form of semantic normalization. Since the par- 
ticular characteristics of the events reported in the source discourse may have 
been forgotten, they tend to be replaced by standard assumptions from memory 
which match input facts approximately, but not necessarily precisely. 



3.3.2 An empirical look at summarization strategies or operators 

There are advantages in enlarging our view of the cognitive activities during 
summarization to include a greater variety of intellectual strategies with a 
more precise definition. 

ENDR91 have employed a thinking-aloud approach to look at their own 
summarization processes in order to find out more precisely which strategies 
occur in summarizing. The three authors work in the context of ordinary aca- 
demic summarizing. Their test material consists of two introductions to Ger- 
man Prolog textbooks, about two pages in length. 

They start out with the above-cited macrorules in a version revised by 
SHER89 (arranged by level of difficulty): 

• Delete trivial and redundant information. 

• Select a topic sentence already in the text. 

• Substitute a general term for a list of objects or a sequence of actions. 

• Invent a topic sentence if it does not appear in the text. 
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These four summarization rules serve as prototypes for the definition of addi- 
tional strategies. 

After the transcription and segmentation of the working processes, the three 
authors concentrate on the drafting of the summary, disregarding the revision 
phase. The authors explain their observable behavior in terms of the four 
macrorules or, if the rules do not fit observation, by newly invented strategies 
of the same granularity. The new strategies are collected in an intellectual 
toolbox and ordered according to their functions: general inference, planning 
and control, knowledge acquisition, relevance judgements, meaning reduction, 
condensation, construction, and output. 

The core reduction strategies in the classes meaning reduction and condensa- 
tion (i.e., abridging the information size without loss of content, especially by 
terser formulation) are found in the company of supporting strategy classes as 
predicted by the discourse processing model of KINT83. However within the 
classes, several strategies have been listed and defined more precisely. Since 
finally every deletion or selection of information needs a justification, the rele- 
vance assessment strategies attract particular attention. ENDR91 find that 
different arguments and items of background knowledge are used to decide 
about the importance of an item: 

• fact: Relevant is what is important according to domain knowledge. 

• topic: Relevant is what relates to the text topic. 

• purpose: Relevant is what serves the purpose. 

• relpositive: Relevant is what is stated positively. 

• contrast: Relevant is what differs from other things. 

• stress: Relevant is what is characterized as relevant in the text. 

Meaning reduction strategies also differ from the former macrorules because 
they use knowledge and require some reasoning. For instance: 

• noreason: If you have the statement, do away with its reasons. 

• novoid: Leave it out if it is not informative. 

• nocomment: No comments and added explanations. 

• no example: Drop examples. 

Further reduction of the summary size is achieved by weeding out redundancy 
and choosing terse formulations. During writing, all three authors have in mind 
a summary-specific presentation that should be correct, positive, understand- 
able without access to additional material, and concrete. 
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3.4 Assessing importance (relevance, interestingness) 



3.4.1 Introduction 

A summarizer who reduces a representation to the most important information 
is all the time deciding about the importance of meaning items. The reader is 
invited to explore the dimensions of relevance by looking through the following 
organized list of examples: 

1. Arguments referring to the topic 

The structure of the topic. The structure of the topic determines what is to be in- 
cluded in a summary. Core features of the object of description are important. 
Examples: 

• If the object is a computer system considered from the point of view of the 
user, then its functions are essential, because this is what the recipient of 
the original and the summary asks about. 

• If the object is the history of a literary figure (e.g., Cassandra or Icarus), 
then the situations in the history of ideas where these figures play a role 
are central. 

• If we are dealing with a completed operation (e.g., the conquering of Mex- 
ico by the Spaniards, the story of the Pied Piper of Hamlin or Alice’s ad- 
ventures in Wonderland), then the course of action is decisive for the 
summary. 

2. Arguments referring to reception and audience 

Adaptation to information needs. A summary should answer the summary users’ 
questions. Important is what fits their needs. 

Examples: 

• If a fashion magazine requires a summary report about what designer 
gowns the ladies wore to a music festival, then details about the singers, 
stage set, musical interpretation of the band, etc., are of marginal impor- 
tance. 

• If the same event is to be presented succinctly on the society page of a 
quality newspaper, then the artists’ performance is central and the ladies’ 
gowns are not even worth mentioning. 

Ease of knowledge assimilation. A summary is intended to present information 
that can be assimilated by the audience. This is why it has to look at the prior 
knowledge of its recipients. What they are unable to assimilate can be omitted 
without any harm in favor of information that reaches the audience. 
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Examples: 

• The summary of a statistics paper for doctors must select content items in 
such a way that it does not presuppose any expert knowledge of statistics. 

• A summary report about an organizational change addressed to the com- 
pany’s management has to focus on decision-relevant information that can 
be correctly interpreted without additional background knowledge. 

Information value, innovation value, interestingness. Summaries have to fulfill 
a purpose by being effective as a means of communication. Therefore, from the 
information contained in the original, they must focus on what is new and in- 
teresting for the audience, i.e., what has an information value. Interestingness 
may arise through the summary user’s cognitive or emotional commitment. 
Whatever imparts an information value to a summary is important. 

Amount of information in a summary. A summary which is constrained to a 
maximum of 40 words cannot convey as much information as a summary of 
150 words. Consequently, all sorts of relevance decisions must be applied more 
restrictively, irrespective of the basic arguments. 

3. Arguments referring to the author of the original 

Focus and text design. Most of the time, the author has good reasons for choos- 
ing a certain text organization, usually in line with the demands of the de- 
scribed object. A good summary should respect the author’s design wherever 
possible. 

Examples: 

• If in a history of the British Empire an author emphasizes the 18th and 
19th centuries and writes little about the 20th century, the summary should 
reflect these proportions. 

• Since Charles M. Schulz depicted Peanuts in illustrations and text, a good 
summary should also take this form into account, i.e., include a selection 
of the original drawings. 

• If the original includes concrete figures, the summary should also work 
with figures. 

The author's intentions. The author’s intention is important. A summary should 
convey the author’s reasons for writing. For instance, if an author defends the 
policy of a certain administration, this should come across in the summary. If 
the author tries to create a certain atmosphere, e.g., the spirit of renaissance 
Italy, the summary should reflect this. 

4. Communication possibilities of the summarizer 

Limits imposed by cognitive capacity and own knowledge. Summarizers may 
have to process more information faster than they are able to. Normally they 
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will have developed intellectual techniques to cope with such overtaxing situa- 
tions, and will nevertheless produce good summaries. Where all else fails, they 
have to fall back on safe statements, even at the expense of the information 
value of the summary. What is beyond personal understanding is unimportant. 

Limits imposed by the media situation. Summarizing means producing a minia- 
ture of a larger original. In comparison with the original, we may have to re- 
consider the means of representation, often reducing them to a simpler level. 
Examples: 

• A radio commentator summarizes a soccer game. (S)he is restricted to 
spoken language. 

• A TV commentator may use sound and image material to present the same 
game. Time may be a restricting factor in both cases. 

This list of influential factors with regard to what is important in summariza- 
tion is certainly incomplete. However, it shows how many factors a sum- 
marizer has to consciously keep in mind when selecting the relevant informa- 
tion. It is worth investigating in detail how relevance judgements during sum- 
marizing are obtained. 



3.4.2 Importance depending on source information features 

Of the factors determining the organization of summaries, the one which has so 
far received the most attention among researchers, is the influence of the 
source information. The basic line of argument is that the summary is, as it 
were, embedded in the source information. People react to source information 
features when they summarize a text. In typical experiment designs based on 
the assumption that the summary is steered by the source information, re- 
searchers ask test subjects to produce summaries under controlled conditions. 
The recall (the contents of the summary) is then explained through the mean- 
ing structure of the source text or through the combination of source text and 
cognitive processing (see the study of SCHN81 described above). 

Features of the source information that influence the contents of a summary 
are looked for principally in the structure of the treated object and in the text 
structure. As already explained above, texts often faithfully reproduce the 
structure of events or objects. As soon as the amount of information increases, 
however, recipients require communicative aids in addition to the core infor- 
mation that facilitate understanding. An example of a representation of a sub- 
ject matter formed from a presentation point of view is the introduction to 
symbolic interactionism used by SCHN81 (see above) with its text-type spe- 
cific superstructure, and the Circle Island story (see Fig. 3.16.). 

Studies concerned with the connection between source information and 
summary for the most part concentrate on the impact of a single factor. In the 
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following, we first of all describe approaches which derive the structure of 
summaries from the described object and its representation. We then extend 
our observations to discourse structures that summon up discourse components 
to convey a subject, for example including orientation assistance for the reader 
in the form of a setting or an introduction. 



3.4.2.1 Semantic constituents of events and action-based stories 

In simple action-based folk tales, the grammar of the events and the grammar 
of the story coincide. Stories and actions follow event schemata (RUME75, 
RUME77). With respect to summarizing, the story and event schemata func- 
tion like macrostructures, allowing the summarizer to choose the most impor- 
tant subevents or respective statements. They have roughly the following 
structure: first, something happens to the protagonist that sets up a goal for her 
or him to accomplish. Then the remainder of the story is a description of the 
protagonist’s problem-solving behavior as (s)he seeks to accomplish her or his 
goal, going through a sequence of episodes. The episodes are built according 
to an episode schema (see Fig. 3.8) that has a cause schema, a try schema, 
and an outcome schema as its immediate constituents. 

RUME77 proposes a summarization procedure that exploits event schemata. 
He checks his model by having 10 subjects summarize a set of brief stories, 
among them The countryman and the serpent (see Fig. 3.7). He observes that 
the summaries predicted by the model and those obtained from his test sub- 
jects correspond: 94% of the expected summary statements were actually ob- 
served, and 88% of the observed statements were expected by the model. 
Summarization of short folk tales seems to be guided by the semantic structure 
of the plot. 

The importance of an item is derived from two features: 

• the hierarchical level in the event scheme: low-level items are deleted 

• the semantic category and its treatment in summarization formulas: psy- 
chological actions are deleted altogether, whereas for example successful 
actions are reported. 

The Aesop fable The countryman and the serpent helps us to discuss the sum- 
marizing method of RUMF77 in more detail. Figure 3.8 gives an interpretation 
of the fable that resembles an instantiated macrostructure (see Fig. 3.4 above). 

The overall episode is experienced by the countryman. The components of 
the countryman’s struggle with the snake are represented in propositional form. 
Propositions that appear at the surface of the text are accompanied by numbers 
referring to their respective text statements. Other propositions are inferred 
from information in the text. According to the episode schema, an episode is 
composed by sub-episodes or subevents. For instance, the countryman’s at- 
tempt to get an axe consists of his choice to use an axe, of his taking the axe, 
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and of the consequence of having it at his disposition. The episode stmcture 
maps onto the more general macrostructure of Fig. 3.4 above, if we assume 
macrostrategies that work by construction, putting together more global epi- 
sodes from more elementary ones. While in The countryman and the serpent 
story, every episode imposes categories on its members (cause, try, and out- 
come), this is not required in the general macrostructure. However, the macro- 
structure states that the overall topic sentence be represented by the uppermost 
structural level, and that lower-level nodes be less central in the plot. This is 
the case in the hierarchical semantic structure of the Aesop fable. 



(1) A countryman’s son, by accident, trod upon a serpent’s tail. 

(2) The serpent turned 

(3) and bit him, 

(4) so that he died. 

(5) The father, in revenge, 

(6) got his axe, 

(7) pursued the serpent, 

(8) and cut off part of his tail. 

(9) So the serpent, in revenge, 

(10) began stinging several of the farmer’s cattle. 

(1 1) This caused the farmer severe loss. 

(12) Well, the farmer thought it best to make it up with the serpent. 

(13) So he brought food and honey to the mouth of its lair 

(14) and said to it, “Let’s forget and forgive; perhaps you were right to 
punish my son and take vengeance on my cattle, but surely I was 

right in trying to revenge him; now that we are both satisfied, why 
should we not be friends again?” 

(15) “No, no,” said the serpent, “take away your gifts; you can never, 
never forget the death of your son nor I the loss of my tail.” 



Fig. 3.7. The countryman and the serpent (from RUME77) 



The representation of the countryman’s conflict with the snake is composed ac- 
cording to a general episode schema (see Fig. 3.9). It says at the upper or- 
ganizational level that there is an episode about protagonist P. An event E 
causes P to desire goal G. P tries to get G until outcome O. 
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episode (C) 



cause (episode (s), desire (c, peace)) try (C, get (C, peace)) O = not (have (C, peace)) 

(15) 



\ J 

select (C, offer (C,G, to S) try (C, get (C, at (G,S)) offer (C, G, to S) C = refuse (S) 

(14) (15) 



select (C, bring (C, G, to S)) bring (C, G, to S) 

( 13 ) 



C = at (G, S) 



r 

cause (step-on (B, S), desire (S, revenge) try (S, get (S, revenge)) O = episode (C) 

in .1 L 



select (S, bite (S, B)) try (S, orient (S, to B)) 



bite(S, B) 

( 3 ) 



C= die (B) 

(4) 



select (S, turn (S, to B)) 



turn (S, to B) 
( 2 ) 



C = oriented (S, to B) 



cause ( kill (S, B), desire (C, revenge)) try( C, get (C, revenge)) O = episode (s) 

( 5 ) 



select (C, chop (C, S)) 



chop (C, S) C= lose (S, T) 

( 8 ) ( 8 ) 



try (C, get (C, axe)) 



try(c, get(C,at(C, S)) 



select (C, take (C, axe)) take (C, axe) C = have(C, axe) 

( 6 ) , 



select (C, chase (C,S)) chase (C, S) C = at (C, S) 

( 7 ) 



C: countryman T: serpent "s tail 

S: serpent CTL: countryman s cattle 

B: countryman s son G: gift of food and honey 



Fig. 3.8. From the schema of the countryman and the serpent (from RUME77) 
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At the lower level, we get definitions of subschemata, e.g., a try schema: 

try; agent A tries to get goal G. 

(1) A selects a method M which could lead to G. 

(2) for each precondition P of M, 

A tries to get P until outcome O. 

(3) A does M which has consequence C. 



cause(E, desire (P, G)) 



episode(P) 

j 

try (P, get (P ,G)) O = outcome (try (P, get (P, G))) 



I ^ \ 1 

select (P, M) try (P, preconditions (M)) do (P, M) C = consequence (do (P, M)) 



P: protagonist of episode E: initiating event 

M: method chosen O: outcome of episode 

G: goal of episode C: consequence of enacting the method 



Fig. 3.9. Structure of an episode (from RUME77) 



rule 1 : episode (P) 

summary of try (P, get (P,G)). 
as a result, sununary of O. 
rule 2: try (P, get (P,G)) 

(a) if precondition, delete. 

(b) if successful, P got G by M. 

(c) if unsuccessful, P tried to get G by M, but summary of C. 
rule 3: psychological actions: desire, select, feel delete 

rule 4: outcomes and consequences 

delete if redundant with previous sentence, 
rule 5: methods 

P did m to get G. 
rule 6: cause (E|, E 2 ) 

when summary of E|, sununary of E 2 . 

note: if either E| or E 2 is a psychological action, then just sununarize the 
remaining event. 

rule 7: any node for which no other rule exists is fully expressed by 
constructing the predicate name to an English verb with the first 
argument as its subject. 



Fig. 3.10. Summarization formulas (from RUME77) 
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Rules that deal with the different types of nodes can then be applied to make a 
story summary from the reduced structure diagram (see the summarization 
formulas in Fig. 3.10). The summarization rule 3, for instance, deletes psy- 
chological actions altogether. The try rule (number 2) is most complicated be- 
cause it deals with different cases. If the protagonist tried to establish a pre- 
condition for further action and failed, this fact can be deleted for lack of in- 
terest (case a). If (s)he was successful, the accomplishment is reported (case 
b). In case of a failure in a main operation, the fact is described and comple- 
mented by a statement of the consequences (case c). 

A summarizer or a summarization procedure can exploit the macro structure 
when reducing an action-based story such as the one about the countryman and 
the snake. The uppermost nodes are preserved, the lower ones are pruned, for 
example by deleting all nodes below a level N. What remains is the represen- 
tation of a summary of level N of the story. Now, from left to right, all nodes of 
the reduced tree are expressed according to the summarization formulas. After 
that, the summary is complete. 

The summary of the Aesop fable might read: A countryman tries to make 
peace with a serpent that takes revenge on him, but the serpent declines his 
peace offer. 



3.4.2.2 Plot units: Semantic structure in terms of affect states 

We may think that in action-based stories problem solving is paramount, and 
organize the story in modules about a hero busy solving problems and their 
subproblems until (s)he reaches the solution and the end of the main episode. 
A summary of an action-based story is then a short statement of the hero’s 
overall problem-solving activity. But we may also interpret stories in terms of 
human feelings. In this case, emotional reactions and states of affect are cen- 
tral in narratives. Most of the time, the hero interacts with an environment that 
may please or displease her or him. Then the story is about human affects in- 
fluenced by events, and heroes of folk tales are pursuing happiness. If so, sto- 
ries are built as patterns of affect configurations, and summaries first and 
foremost report the protagonist’s feelings. 

Affect configurations can be expressed by plot units (LEHN82b). They are 
built from three simple affect states: 

+ (positive events) 

- (negative events) 

M (mental states with neutral affect) 

In representations of plot units (see Fig. 3.11) arrows indicate causal links. A 
link that runs from one mental state (M) to another describes motivation (m), 
while a link connecting a mental state (M) to a positive or negative event (-h 
or -) describes actualization (a). The termination link (t) states that the affec- 
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tive impact of an event is surplanted or displaced. Equivalence links (e) are 
used to separate multiple perspectives of a single affect state. 

Elementary combinations of the three basic affect states and the four links 
are shown in Fig. 3.11. A mental state (M) which actualizes (a) as a positive 
event (+) gives the primitive plot structure of a success; when a mental state 
(M) gives rise to an act which is not satisfactory, this is a failure. 

These states of affect are bound to a character. As soon as two or more char- 
acters are involved in a story, multiple affect states are required to describe 
the situation. Diagonal transfer links show how the state of one mind influences 
the other (see the plot units competition and denied request). 



motivation success failure competition denied request 




a: actualization 
m: motivation 



Fig. 3.11. Plot units (from LEHN82b) 



In the COMSYS story (see below), the overall story plot is made up of simpler 
plot units. Among others, it uses the competition and the denied request plot 
defined above. The positive event for John (getting his promotion) is at the 
same time negative for Bill’s state of feelings (see the diagonal arc). When 
John asks for a new job at COMSYS, the diagonal arc represents a request di- 
rected towards Bill. Bill reacts negatively to John’s request. 

Verbatim summaries of the COMSYS story by test subjects state the plot 
structure as given in Fig. 3.12. They may read as follows (more examples in 
LEHN82b): 

1 . John, Bill compete for a job which John wins, causing Bill to quit the company and 
start his own firm (COMSYS), which leads to Bill’s spiteful rejection of John’s re- 
quest for a job some years later in Bill’s successful company. 

2. Bill turned John down for a job because John had beat him out of a promotion when 
they both worked for IBM. 

3. Bill started his own business COMSYS after losing out to John for a job at IBM and 
later out of spite refused to give John a job when John was dissatisfied with his old 
one. 

In terms of semi-formalized plots units the three summaries may be formulated 
as follows: 

1 . COMPETITION which {John’s SUCCESS, Bill’s FAILURE} causing Bill’s SUCCESS 
which leads to (RETALIATON, DENIED REQUEST, John’s FAILURE} of John’s RE- 
QUEST. 
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2. RETAUATON when John’s REQUEST, because COMPETmON (infer John’s SUC- 
CESS, Bill’s FAILURE, DENIED REQUEST, John’s FAILURE). 

3. COMPETITION which {John’s SUCCESS, Bill’s FAILURE} so Bill’s SUCCESS and 
later had the opportunity to (DENIED REQUEST, John’s FAILURE} (implicit John’s 
REQUEST, RETAUATON). 

The skeleton presentation reveals behind the verbal summary well-known 
event structures, the plot units. They are somehow natural clumps of informa- 
tion that transmit some commonsense knowledge about the reason for actions. 
For instance, Bill’s failure and John’s success are tied to their competition by 
fact. Since we all know many plot units such as FAILURE or RETALIATION, 
we can use them to set up story structures inductively. 

During summarization, plot units can serve as abstraction devices, because 
they are known patterns or schemata in episodes of our own experience and of 
stories. LEHN82b found that good summaries by her test subjects mentioned 
the most interesting plot units. Importance decisions steered by plot units leave 
out detail and mention only the plot unit, i.e., the action and affective state 
schema, and the affected person. Anything else is not deemed important. 





M 




J wants promotion 
gets promotion 



wants new job 



is denied 



John and Bill were competing for the same job promotion at IBM. John got the 
promotion and Bill decided to leave IBM to start Ms own consulting firm, COMSYS. 
WitMn three years COMSYS was flourishing. By that time John had become 
dissatisfied with IBM so he asked Bill for a job. Bill spitefully turned him down. 



Fig. 3.12. The COMSYS story (adapted from LEHN82b) 



3.4.2.3 The causal network and the causal chain 

Story summarizing can also use causal relationships. From this point of view, 
events are of primordial interest for the summary if they belong to the causal 
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chain from the beginning to the end of the story. Anything else can be dropped 
as less important. 

We infer the causal chain which is the means for connecting underlying 
events/states in a text (SCHA75). When understanding a paragraph or a para- 
graph-length story, we infer the conceptualizations of the underlying sentences 
and events. In addition, we infer assumptions about the background of the story. 
Without this context, the particular events would be meaningless. Understand- 
ing means finding out why things in the story happen as they do, i.e., inferring 
how an individual event is brought about or enabled by a preceding one, and 
what are the consequences of a particular action or state of affairs. In order to 
tie together the episode or story, we do at least the amount of inferencing 
which is useful to represent the input as an interconnected chain of representa- 
tions. 

If this is the case, the representation of a paragraph or a story is a combina- 
tion of the conceptualizations underlying the individual sentences plus the in- 
ferences about the necessary conditions that tie one conceptualization to an- 
other or to a given normality condition. 

What this means can best be explained through the example of SCHA75 
(compare the representation in Fig. 3.13): 

It was a warm June day. John began to mow his lawn. Suddenly his toe started 

bleeding. He turned off the motor. 

The lawnmowing network shows the events/states of the brief story (conceptu- 
alizations C1-C9) in propositional representation. Added are absolutely neces- 
sary conditions (ANCs) for the events reported. Absolutely necessary condi- 
tions must be given in order to enable an act: 

ANCl: Unless John has a lawn, John cannot mow it. 

ANC2: Unless John has a lawnmower, he cannot mow. 

ANC3: If John does not work barefoot, his toe cannot be hurt and bleed. 

Not explicitly stated but implicitly present are reasonable necessary conditions 
(RNCs). For instance it is an RNC that the scene takes place on a warm day 
in June when outdoor activities such as lawnmowing are natural. If it were 
Christmas Eve, such a reasonable necessary condition for lawnmoving would 
be violated. The reader would wonder, ask for an explanation, and try to infer 
the missing link. The missing link is often the point of the story (see below). 

Necessary conditions are that subset of possible inferences which are com- 
monly used to connect isolated conceptualizations. They are checked by 
means of establishing whether a record of them exists in memory. If they can 
be tied to information already present from previous input, an enable causation 
link is established between the previously input or inferred state and the newly 
input conceptualization with the help of general factual knowledge and nor- 
mality conditions. For instance, it is quite usual for people like John to possess 
lawns and lawnmowers. So nobody minds that they were not explicitly told that 
in the story. 
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The most interesting causal link of the story establishes how John’s foot hap- 
pened to bleed. Here, readers generate a question: What makes toes (fingers, 
etc.) bleed? They hypothesize a causal chain from lawn mowing to blades turn- 
ing to toe bleeding, infer a contact between toe and blades and link the new 
event to the story conceptualization without much trouble (see in Fig. 3.13 
proposition C6 with its result and enable link). 

As Fig. 3.13 shows, a story can be seen as made up of causal chains, some 
leading to dead ends and at least one carrying on the theme and point of the 
story. As long as the required necessary conditions can be established, the text 
is coherent and understandable. When these inferences are too difficult or im- 
possible, the reader gets into difficulties. 



Cl (day temperature (warm)) 



enable 



ANCl 
ANC2 
I enable 



C2 (day time (june)) 

I enable / result 



C3 (grass size (taller)) 



C4 (john propel lawnmover direction) 
I result 

C5 (grass size (short)) 

C6 (toe state (cut)) 

I enable 



result 



ANC3 

'^enable 



Cl (expel blood toe) 



initiate / reason 



C8 (john do) 

I result 

C9 (lawnmover state (off)) 



ANCl : John has a lawn. 

ANC2: John has a lawnmower, etc. 

ANC3: John has a toe and he is not wearing shoes. 



Fig. 3.13. Representation of the lawnmowing story (adapted from SCHA75) 



A memory representation of the story records the results of the understanding 
process. We expect: 

• the conceptualization of each input statement 

• the connections of the conceptualizations that underlie the input state- 
ments, especially the causal chain as basic connectivity tool 

• the inferences from the input conceptualizations if they are used to estab- 
lish the causal chain 
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• the necessary conditions satisfied for every conceptualization, inferring 
from facts both inside and outside the text 

• a structure joining together various causal chains that culminate in the 
point of the story 

• dead ends that lead away from the main flow of the story. 

Reasonable first summarization rules can be given that create a summary net- 
work by pruning the causal chains: 

1. Dead end chains are forgotten. 

2. Sequential flows (correct chains) may be shortened. 

(a) The first link in the chain is the most important. 

(b) Resolution of questions or problems is, too. 

3. Disconnected pieces will be either connected correctly or forgotten. 

4. Pieces that have many connections are crucial. 

These rules basically argue that important propositions are those that partici- 
pate in the main causal chain of the story. All others may be left out in a sum- 
mary, because they are less necessary to infer a coherent representation of the 
overall event. Importance judgements are based on causal connectedness. 

Identifying a causal chain: the father, his son, and their donkey. In a re- 
search series (in particular TRAB85a,b) Trabasso, Broek, and colleagues op- 
erationalized story representations with causal networks and causal chains. 
Their global empirical result is that participation in the causal network ac- 
counts for a high percentage of the importance of a statement, and of its pres- 
ence in recall and summary. 

As an example we take the story of The father, his son, and their donkey (see 
Fig. 3.14). It belongs to a corpus of stories from Japan and China that was in- 
vestigated earlier by BROW77, who also provided empirical importance rat- 
ings for every statement of the story (see row Average importance in Fig. 3.14), 
by asking four adult raters to assess the statements and calculating the mean 
score. 

For the sake of simplicity, TRAB85a present the surface formulation of 
statements, although they argue about the underlying propositions. They de- 
scribe in detail how they infer the causal relationships between pairs of propo- 
sitions. In principle, an event A is said to be necessary to event B if it is the 
case that had A not occurred then, in the circumstances, B would not have oc- 
curred. In the example story, event (53) is said to have caused event (54) 
since if the donkey had not disliked being tied, then it would not have kicked 
ferociously. Event (54) caused event (55) since if the donkey had not kicked 
ferociously, it would not have broken the rope. 
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Pause unit 


Average 

importance 


1 . 


A father and his son 


3.35 


2. 


were taking their donkey to town 


3.41 


3. 


to sell him 


2.71 


3. 


at the marketplace. 


1.29 


5. 


They had not gone a great distance, 


1.24 


6. 


when they met a group of pretty maidens 


2.65 


7. 


who were returning from the town. 


1.24 


8. 


The young girls were talking and laughing, 


1.41 


9. 


when one of them cried out, “Look there. 


1.94 


10. 


did you ever see such fools. 


3.00 


11 . 


to be walking alongside the donkey when they might be riding 
on it?” 


3.47 


12. 


The father, when he heard this. 


1.82 


13. 


told his son to get up on the donkey. 


3.41 


14. 


and he continued to stroll along merrily. 


1.94 


15. 


They travelled a little further down the road. 


1.29 


16. 


and soon came upon a group of old men talking. 


2.76 


17. 


“There”, said one of them. 


1.59 


18. 


“that proves what I was saying. 


2.00 


19. 


What respect is shown to old age in these days? 


3.53 


20. 


Do you see that idle boy riding the donkey. 


3.18 


21. 


while his father has to walk? 


3.18 


22. 


You should get down 


2.41 


23. 


and let your father ride!” 


3.06 


24. 


Upon this, the son got down from the donkey 


2.94 


25. 


and the father took his place. 


3.35 


26. 


They had not gone far 


1.12 


27. 


when they happened upon a group of women and children. 


2.41 


28. 


“Why, you lazy old fellow. 


2.76 


29. 


you should be ashamed,” 


2.76 


30. 


cried several women at once. 


1.35 


31. 


“How can you ride upon the beast. 


2.82 


32. 


when your poor little boy can hardly keep up with you?” 


3.59 



Fig, 3.14 first part. The father, the son, and his donkey. Story statements with im- 
portance scores (from TRAB85a) 
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Pause unit 


Average 

importance 


33. 


So the good-natured father hoisted his son up behind him. 


3.47 


34. 


By now they had almost reached the town. 


1.59 


35. 


“Tell me friend”, said a townsman, 


1.65 


36. 


“is that donkey your own?” 


2.21 


37. 


“Why yes”, said the father. 


1.94 


38. 


“I would not have thought so,” said the other. 


2.41 


39. 


“by the way you overwork him. 


3.18 


40. 


Why, you two are strong 


2.29 


41. 


and are better able to carry the beast than he is to carry you.” 


3.47 


42. 


“Anything to please you sir,” said the father. 


2.71 


43. 


“we can only try.” 


2.41 


44. 


So he and his son got down from the donkey. 


3.00 


45. 


They tied the animaTs legs together. 


2.88 


46. 


and, taking a pole. 


1.65 


47. 


tried to carry him on their shoulders 


3.47 


48. 


over a bridge 


1.59 


49. 


that led to the marketplace. 


1.41 


50. 


This was such an odd sight 


2.00 


51. 


that crowds of people gathered around to see it. 


2.65 


52. 


and to laugh at it. 


2.18 


53. 


The donkey, not liking to be tied. 


3.06 


54. 


kicked so ferociously 


2.71 


55. 


that he broke the rope 


2.94 


56. 


tumbled off the pole into the water 


2.88 


57. 


and scrambled away into the thicket. 


3.00 


58. 


With this. 


1.12 


59. 


the father and his son hung down their heads 


2.65 


60. 


and made their way home again. 


1.94 


61. 


having learned that by trying to please everybody. 


3.65 


62. 


they had pleased nobody. 


3.82 


63. 


and lost the donkey too. 


3.76 



Fig. 3.14 continued. The father, the son, and his donkey. Story statements with im- 
portance scores 
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Fig. 3.15. Causal network representation for The father, his son, and their donkey (adapted from TRAB85a) 
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The authors distinguish the following types of relation between statements 
(taken from WARR79): 

• motivation 

• psychological causation 

• enablement 

• temporal succession 

• temporal coexistence 

For instance, a physical cause involves naive interpretations of causality be- 
tween objects and/or people. The donkey’s kicking (54), given our knowledge 
of the physical world, causes the rope to break (55). Enablements involve ac- 
tions, occurrences, or states which are necessary but not sufficient to cause 
other actions or states. For example, statement (48) enables statement (56). In 
temporal succession, two events happen in sequence but are not causally re- 
lated. It is also not the case that the first event enables the second event. In 
temporal coexistence, two events happen at the same time but are not causally 
related. This is the case, for instance, in statements (6) and (8): The fact that 
the girls there are laughing and talking is not necessary for one of them to call 
out. The distinction between motivation and psychological causation is given 
mainly by the difference between goal-directed and non-goal-directed actions. 

The causal chain begins with an opening. It consists of those events which 
introduce the protagonist(s), set the time and locality, and initiate the story’s 
action. The closing of a story is defined by the statements that indicate goal 
attainment; if the attempts fail, the causal chain ends with the direct conse- 
quences of the failure. 

Once opening and closing are defined, we trace events via causal connec- 
tions from the opening to the closing events. Those events which have causes 
and consequences leading from the opening to the closing are on the causal 
chain. Those events which lack causes or which do not eventually lead to the 
closing events are “dead-end” events (following SCHA75). The causal chain, 
generally, is the longest chain of events through the story. 

Figure 3.15 shows the causal network derived from our sample story. Events 
on the causal chain are circled, dead-end events are left uncircled. Dead-end 
events tend to be referenced less frequently, simply because thery are not pre- 
supposed in later statements. For the events on the causal chain, the number of 
connections varies. Event 62, for instance, is accessed by 10 inferential links, 
event 61 by 6. In the network, both events are more connected than other 
propositions. This makes them more central and more important. 

Indeed the authors found by regression analysis that a large part of the vari- 
ance in the importance judgements was explained by the factors connectivity 
(the number of links to other items) and the participation in the causal chain. 
The factors coincide to a certain degree. Both the causal chain and the causal 
connections score describe the importance of events as “being strongly in- 
volved in the causal plot” of the story. 
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According to the results of TRAB85a,b we cannot assume, a priori, that a su- 
perordinate goal (or proposition) is the most important one even though it may 
be placed at the top of a hierarchical scheme. The degree to which goals are 
put in summaries depends on their causal role rather than upon their hierarchi- 
cal status. In the investigation, superordinate goals were summarized more of- 
ten than subordinate goals only when these goals were in the causal chain and 
had more connections than subordinate goals. In general, superordinate goals 
will have more causal consequences than subordinate goals, and causal con- 
nectedness is a good indicator for text-oriented importance of story events. 



3.4.3 Importance for communication 

As soon as an author does more in a discourse than reproduce an act, an event 
or a state of affairs, the relevance of the meaning units can no longer merely 
be derived from their role in the described acts, as otherwise all meaning units 
that do something other than describe the act could be seen from the outset as 
unimportant. But in many cases, things used for actions are not automatically 
known to all addressees and must be explained. These explanations are of ob- 
vious importance to readers. 

Our readers may imagine instructions about how to set up a Zen garden, or a 
report on how this was done. Since most readers are not acquainted with the 
principles governing Zen gardens, they will gratefully accept an explanation of 
Zen garden essentials before they follow the report or the instructions, and they 
will find it important. Generally speaking, the information packages that we 
absorb must often comply with more communicative functions than the central 
one, for instance the report or instructional function. This is not new. Standard 
discourse structures (superstructures) accommodate different types of informa- 
tion for the different communication needs. 

So does the story of Circle Island (from THOR77 - Fig. 3.16), which has an 
equally long research tradition as the stories of The countryman and the serpent 
and The father, the son, and their donkey. 

In contrast to the stories analyzed above, the Circle Island story comprises 
an introduction, which familiarizes the reader with the location and the current 
political situation of Circle Island. Readers of the other stories did not need an 
exposition of this kind, since normal commonsense was sufficient for under- 
standing. Unlike Aesop, however, the author of the Circle Island story must in- 
troduce his readers to an unknown developed society. To understand the farm- 
ers’ anger it is essential to know how the island is governed. Readers must 
learn about these facts, otherwise they cannot make the right inferences and 
form a correct picture of the situation on Circle Island. The author, in her or his 
function as communicator, takes the justified wishes of the audience into 
account. Before reporting the events on Circle Island, (s)he introduces the 
reader to the situation on the island. 
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(1) Circle Island is located in the middle of the Atlantic Ocean, 

(2) north of Ronald Island. 

(3) The main occupations on the island are farming and ranching. 

(4) Circle Island has good soil, 

(5) but few rivers and 

(6) hence a shortage of water. 

(7) The island is run democratically. 

(8) All issues are decided by a majority vote of the islanders. 

(9) The governing body is a senate, 

(10) whose job is to carry out the will of the majority. 

(11) Recently, an island scientist discovered a cheap method 

(12) of converting salt water into fresh water. 

(13) As a result, the island farmers wanted 

(14) to build a canal across the island, 

(15) so that they could use water from the canal 

(16) to cultivate the island’s central region. 

(17) Therefore, the framers formed a procanal association 

(18) and persuaded a few senators 

(19) to join. 

(20) The procanal association brought the construction idea to a vote. 

(21) All the islanders voted. 

(22) The majority voted in favor of construction. 

(23) The senate, however, decided that 

(24) the farmers’ proposed canal was ecologically unsound. 

(25) The senators agreed 

(26) to build a smaller canal 

(27) that was 2 feet wide and 1 foot deep. 

(28) After starting construction on the smaller canal, 

(29) the islanders discovered that 

(30) no water would flow into it. 

(31) Thus the project was abandoned. 

(32) The farmers were angry 

(33) because of the failure of the canal project. 

(34) Civil war appeared inevitable. 



Fig. 3.16. The Circle Island story (from THOR77) 



As a result, the discourse structure of the Circle Island story comprises more 
than the sequence of events described. Statements 1-10 (the setting - see Figs. 
3.16 and 3.17) are addressed directly to audience and create the preconditions 
for understanding the story. The recipients of a summary also need this orienta- 
tion. It should therefore be included. The Circle Island story demonstrates that 
what is important in a narrative is not only determined by the event structure. 
Those elements of the discourse structure that go beyond a description of 
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events, for example the introduction, also contain information that is relevant 
for summary users. 

Text components that are important for communication are often familiar as 
standard elements of discourse types. This is what enables us to refer to the 
discourse schema (or superstructure) in order to substantiate the relevance of 
the respective unit of information, saying for instance that information from the 
setting is important. A discourse schema can be formulated in the form of a 
grammar, as seen in Fig. 3.18. The story grammar embeds categories such as 
episode, attempt, event, and state in categories that characterize discourse 
components and are communicatively substantiated: setting, theme, plot, reso- 
lution, characters. 




Fig. 3.17. Scheme of the Circle Island story (from THOR77) 



It is normal for a discourse to contain information that goes beyond its central 
object. The extremely simplified stories that were used to investigate text 
representations and their relations with a summary are the exception rather 
than the rule. Additional information may be important, because with this the 
author shows understanding for her or his readers, guides them to the central 
theme, possibly creates conditions for understanding, demonstrates application 
possibilities, etc. Even in a summary, it is not possible to do without the 
communicative function especially of introductory and concluding text pas- 
sages. Well-advised summarizers take discourse structures and communicative 
functions into account. 
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The experiments of THOR77 reveal the use of the conventional presentation 
sequence of discourse components described by the story grammar. The story is 
easier to understand if the author introduces the readers into the world of the 
action, i.e., if first of all the setting is established and then the problem or 
theme is stated, before the real tale begins. This enables readers to activate 
their own knowledge schemata step by step, as needed in order to interpret the 
story. If the sequence of text components is altered, for example by not stating 
the theme until after the story, understanding and summarizing become more 
difficult. The summaries are longer. Summarization broke down when the test 
subjects in THOR77 were finally confronted with the Circle Island story as a 
“scrambled texf’ (all sentences intact, but in a random order). They wrote 
down 89% of the propositions contained in the text. 



(1) STORY 

(2) SETTING 

(3) THEME 

(4) PLOT 

(5) EPISODE 

(6) ATTEMPT 

(7) OUTCOME 

(8) RESOLUTION 

(9) SUBGOAL 1 
GOAL J 

(10) CHARACTERS 
LOCATION 
TIME 



SETTING + THEME + PLOT + RESOLUTION 
CHARACTERS + LOCATION +TIME 
(EVENT)* + GOAL 
EPISODE* 

SUBGOAL + ATTEMPT* + OUTCOME 
r EVENT* 

L EPISODE 
r EVENT* 

L STATE 
r EVENT 
L STATE 

DESIRED STATE 

STATE 



Fig. 3.18. A story grammar (adapted from THOR77) 



3.4.4 Situated relevance 

Introduction. In summarizing, communicators are just as much involved in a 
communicative situation as in other forms of communicative action (see Fig. 
3.1). During understanding, a recipient reconstructs a discourse and at the same 
time continues to develop his or her own conception of the world. To the best 
of their ability, the recipients perceive what the producer of the discourse 
presents as important, but reconstruct it from their own prior knowledge. In 
doing so, addressees contribute their own efforts and evaluations and decide for 
themselves what they consider important. This perspective is in sharp contrast 
with the position that importance is conveyed by the source information itself 
As soon as recipients are given a role in determining the important points of 
a discourse, the relevance decision can no longer be based solely on the con- 
tents and the information organization determined by an author. Recipients 
may introduce viewpoints that are unknown to the author, because they result 
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from their own prior knowledge, from a situation that differs dramatically from 
what the author anticipated, and so on. Put trivially, of no importance and of 
no interest to the recipient are things (s)he already knows. Since an author of- 
ten does not know or only slightly knows his or her readers, (s)he may not be 
in a position to take the prior knowledge of individual readers into account. 
What is more, recipients are in a concrete situation. They may be busy with a 
specific task or very short of time. Such parameters naturally influence their 
opinion of what is important to them at the given moment. They may be 
longing for something quite different from matter-of-fact cognitive information 
acquisition: perhaps they are looking for a book to counteract the tristesse of a 
gray November afternoon that will transport them into a different world full of 
excitement. In this situation no parameter-free test method is important enough 
to capture their attention, no matter how well it is theoretically grounded. 

What is true in general also covers special types of communication. In situ- 
ated summarizing, information cannot go into a summary merely because it is 
recommended by the discourse structure. The more stringent condition is that it 
is interesting or worth knowing. The reader and the author of a summary decide 
with their curiosity and interests about the relevance, importance, and interest 
of meaning items. We have to distinguish, however, between writer- and 
reader-based summaries (HIDI86b). Writer-based summaries are meant to 
serve their authors, especially in tasks such as studying or preparing their own 
discourses. In this case, the needs of the writer shape the communicative goals 
of summarizing. Most of the time, however, summaries are recipient-based. 
Recipients are expected to pick up the content, to restructure it with respect to 
their own prior knowledge, to integrate it into their own knowledge structure, 
and finally to use it. When producing summaries intended to serve third par- 
ties, the summarizer will ultimately respect their state of knowledge and their 
attitudes. 

In the research discussed so far, communicators have been involved only in- 
directly in importance decisions. They were test subjects assumed to reliably 
infer a mental model of the event sequence described in a story, and to judge 
as important, to recall and to summarize what the structure of the representa- 
tion suggests: elements of the causal chain or network, plot units, or the state- 
ments from the most important (hierarchically high) story categories. Readers 
were not supposed to apply their own point of view to the story they were in- 
terpreting, or to situate it in their current environment. This attitude is scien- 
tifically correct as long as everybody agrees in their judgement, but in real 
life, this is seldom the case. 

Reader-based (and “subjective”) discourse understanding and summarizing 
has not had much chance to be investigated in the standard research environ- 
ments. Classroom or laboratory situations and test stories which are clearly 
separated from the subjects’ own concerns encourage a rather disengaged atti- 
tude in recalling and summarizing. The results from such an environment are 
true for the classroom or laboratory situation, but outside them, they lack eco- 
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logical validity as criticized by BEAU82. A look at the conventional test texts 
suffices to make the point clear: Indeed discourse understanding has often been 
investigated using short stories that were invented or adapted for scientific use. 
Their advantage is that they present the desired structure. However, they are 
seldom exciting; on the contrary, they are often “dismally dull” (BEAU82). 
Nobody would listen to them attentively in a natural environment, let alone 
summarize them. SCHA75 uses such an artificial story with no point in order to 
explain the well-known restaurant script (see Sect. 2.4.3): 

John went to a restaurant. The hostess seated John. The hostess gave John a 
menu. John ordered a lobster. He was served quickly. He left a large tip. He left 
the restaurant. 

Like the restaurant script, the story states what usually occurs during a visit to 
a restaurant - nothing to write home about. To make a real story out of the res- 
taurant scene, we may imagine a shooting unexpectedly taking place, or the 
wooden chair legs may slowly turn to plasticine, with the consequence that the 
guests sink to the ground because the chair legs give way beneath them. Or 
while John is waiting for his lobster, a lobster lobby may appear petitioning 
against lobster killing. Ecologically valid stories have points of this kind, sim- 
ply because the storyteller must keep the audience interested, whereas stories 
for scientific use do not share this motivation. Results about their summariza- 
tion cannot be assumed to be valid in natural (or real-world) environments. 

It is not entirely easy to draw constructive conclusions from the justified 
criticism of BEAU82. Since the meaning of statements for a situated discourse 
producer and her or his recipient can only be examined in the actual situations 
through sophisticated empirical methods, very little is known about it. With 
respect to summaries, we know even less. 

Despite the manifest gaps in the research, it is nevertheless more meaningful 
to ask about the situated relevance of a meaning unit than to stop at simpler 
relevance concepts referring to the discourse itself and its structure only. They 
have the drawback of excluding people actually responsible for the relevance 
decision. 

In the following, we continue our discussion on the basis of familiar fables. 
Understanding a fable is not complete without grasping its moral, whether 
stated explicitly or implicit in the text. Recipients often have to make their 
own inferences in order to understand the moral of a fable. If they fail to under- 
stand it, they will not be able to summarize it correctly. In other words, a high- 
quality summary cannot be produced without active cognitive participation on 
the part of the summarizer and also the recipient, and this can be observed in 
fable understanding and summarizing. 

By including the addressees, who inevitably introduce their own viewpoints 
with regard to the situation, relevance assessments are placed on a broader ba- 
sis. Next, emotional factors are also taken into consideration, as well as indi- 
vidually different abilities and the situative framework that leads people to 
quite different principles of relevance assessment. Hence, relevance decisions 
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are based on a collection of different arguments. Relevance decisions of this 
kind have the advantage of being closer to reality. However, the principles of 
situated relevance judgements are still much less well developed than the ar- 
guments that can be derived from the structure of stories. 

Inferring the moral of a fable. Fables are simple narrative and didactic texts 
in which positive or negative moral actions are paired with like-valued 
outcomes for the purpose of a moral generalization (DORF94). They occur in a 
just world where the good are rewarded and the bad have to pay the price, as 
in the following summary of the hare and the tortoise story: 

A tortoise and a hare started to dispute which of them was swifter, and, before 
separating, they made an appointment for a certain time and place to settle the 
matter. The hare had such confidence in its natural fleetness that it did not 
trouble about the race, but lay down by the wayside and went to sleep. The tor- 
toise, acutely conscious of its slow movements, plodded along without ever 
stopping until it passed the sleeping hare and won the race. 

This summary seems weak, because it withholds the moral from us, something 
like “Do not be overconfident, be diligent and try!”, inferred by generalization 
of the brave and successful behavior of the tortoise. Since the canvas of the 
just fable world guides inferencing, readers easily come up with the right solu- 
tion. 

In the Aesop fable The serpent and the countryman cited above, the serpent 
provides the story’s point by resisting the countryman’s peace offer with a 
clever argument. The point is so central that it is included in a summary, even 
if the summary consists of only one sentence. The reader sides with the realism 
of the snake, accepting the generalization that bad experiences are burnt into 
memory and cannot be dismissed at will. In the story about The father, his son, 
and their donkey, the overall goal (to sell the donkey) is never achieved, and 
the main point and lesson of the story (“Allen Leuten recht getan ist eine 
Kunst, die keiner kann” - “You cannot please all of the people all of the 
time”) must be inferred from the basic chain of events where the father does 
not resist the loose tongues along their way and thus does not manage to bring 
the donkey to town. The moral is universal enough to be preformulated in 
European proverbs, too, although the story comes from Eastern Asia. 

An investigation by DORF94 showed that people are indeed able to infer the 
points of fables as long as the plot is clearly either good or bad and the out- 
come conforms to the just world assumption. Consequently, a complete ac- 
count of understanding and summarizing fables must include the cognitive 
achievements of the comprehender / summarizer and the summary user. 

The cognitive activity of recipients is not optional but necessary for success- 
ful communication. The author of a fable expects the reader to infer the moral 
and the reader must conform to the author’s expectation. The moral is the cen- 
tral point of the fable, the communicative reason that makes it worth telling. 
With recipients who fail to infer the moral, the story must seem pointless and 
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cannot achieve its communicative function. The communication act fails be- 
cause of missing recipient-based actitivy. 

If the point is what the story is about, we must assume that the conceptual 
representation of a story in memory is structured accordingly. We cannot leave 
the point out with the argument that it is not explicitly stated. Readers perform 
well; they consistently infer it and organize their mental model of the story 
event around it. To know the point is a prerequisite for a good summary. 

Incidentally, the example of inferring fable points shows that modeling the 
cognitive activities of recipients is feasible, at least in the simple case of un- 
derstanding and summarizing fables. 

Stories of interest. Not all discourses, and not all stories, are justified in a 
communication situation because they have a lesson to teach. Stories are - 
according to BREW82 - primarily designed to entertain and achieve this 
function by producing affective states in readers, such as suspense, surprise, 
and curiosity. The affective states are created by particular presentations of 
events which may or may not correspond to the order of the actual events they 
describe. For instance, in a mystery story curiosity is created by omitting 
important events at the beginning of the story. Audiences like exciting stories. 
The more informative and interesting the plot-line of a story, the more likely 
the story is to be rated as good (BREW82 and others). BEAU82 considers in- 
terest to be one of the crucial factors affecting readers’ reactions: an 
entertaining story will be acceptable without a message, while a dull one may 
lose its readers before any message can be delivered, let alone enough for a 
summary. What lacks attraction risks being dropped as unimportant, simply by 
not being perceived. 

There is no reason to blame people for concentrating their cognitive energy 
on things that seem interesting to them. In everyday situations, they cannot 
fully process every word they hear or read. They must decide what to pay seri- 
ous attention to as they go (SCHA79). Serious attention means interest, in 
other words that we let our inference processes loose. Interest involves an in- 
crease in cognitive effort - information search, inferencing, etc. - to augment 
knowledge and understanding about what we learn. What is central to the re- 
sponse of interest is that a person increases intellectual activity to cope with 
the greater significance of incoming information. Only what arouses interest 
may be found important. Interesting things have good chances of appearing as 
more important than uninteresting things because uninteresting things risk not 
being properly processed. 

Basically three conditions elicit interest in a reader (SCHA79): 

• schema-congruent expectations are violated or deviated from 

• schema-relevant information is missing 

• content refers to salient themes (e.g., death) 
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The first case occurred above, when the lobster party intruded in the standard 
American restaurant scene. This is surprising and possibly more exciting for the 
guests than the menu. Understanders react in sympathy with the guests and pay 
attention to the noteworthy event, relegating the script-governed restaurant 
scene to the background. Comprehenders deal with the interesting escape from 
the current schema without trouble, interpreting the lobster party’s entrance 
according to a related schema, for instance of students or workers petitioning, 
and aggregating it with the restaurant scene. 

In the second case, the activation of a schema creates interest in the infor- 
mation which is missing but potentially relevant. For example, in reading 
about an assassination, interest in the identification of the victim and the as- 
sassin would be heightened. Such information happens to be of high impor- 
tance in the event structure underlying a murder. Interestingness and event 
structure are intimately related. 

Salient themes may be salient for many reasons. They may be life themes 
like death and marriage, controversial topics in politics, or other hot issues 
which possibly stimulate not only interest but also anger or other feelings. Life 
themes such as death and power may interest almost anybody in any context, 
whereas others, such as soccer goals, are of extreme interest only in the re- 
spective environment. 

Cognitive interest. Interest may arise from the relationship between incoming 
knowledge and background knowledge, and it may be induced by the elicita- 
tion of a direct emotional response (KINT80). Cognitive interest comes from 
events which are interesting because of the roles they play in some complex 
cognitive structure, or the surprises they hold, whereas emotional interest is 
created through events which have an arousal function such as violence and 
sex. 

The cognitive interest value seems determined by the interaction of three 
factors: how much we know about the subject matter, the degree of uncertainty 
the input discourse generates in the reader, and “postdictability”, that is, how 
well the information can be meaningfully related to other sections of the dis- 
course. Among the three factors, the reader’s knowledge structure plays the 
major role. The other factors, uncertainty and postdictability, depend on 
knowledge-based expectations. We can assume cognitive interest to be a non- 
monotonic function of two of the factors: knowledge and uncertainty, peaking 
at the right amount of both of them. Relatively small deviations from expecta- 
tions are optimal in creating interest. If a situation or event is as expected, in- 
terest must be lacking. If it is too unfamiliar, the new information cannot be re- 
lated to the existing knowledge structure and no interest is aroused. 

Emotional interest. Everybody knows that events increase in interest if they 
involve friends or relatives. Novels have heroes with whom readers may iden- 
tify and whom they may follow through the adventures of the plot. In both 
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cases the readers engage themselves emotionally and live with the friend or 
the sympathetic hero. If the sequence of events takes readers through exciting 
environments (exotic countries, the outer margins of our galaxy, etc.), their 
emotional involvement increases as well, because the events are more anoma- 
lous and exciting to them. Going through a dramatic human situation such as 
deciding between two lovers, or between adventure and ordinary life at home, 
is also emotionally stimulating. 

Emotional interest is frequently involved in recipient-based decisions about 
importance. Authors are assumed not to differ too much from their readers. In 
many cases, their intention is to touch the sensitivity of their readers, as Har- 
riet Beecher Stowe does in her book Uncle Tom's Cabin, to frighten them like 
Edgar Allan Poe in his mystery stories, or to convey to them other emotional 
experiences. If readers and authors agree that having a strong sentiment of 
compassion or thrill is the central communicative aim of a discourse, then the 
emotional message is important and must show up in a summary. 

The fact that readers react to stimuli of emotional interest is known to writ- 
ers, particularly to journalists and playwrights, and influences the way they 
present contents. Readers may invest more emotions in a detective story or the 
report of a basketball game with their own team involved than in a scientific 
paper or a textbook chapter. However, expository and scientific texts also con- 
tain elements of emotional communication. The present book is no exception 
to this rule. We hope that varied examples, a mix of graphic and written means 
of presentation, and an included CD-ROM simulation system make it not only 
easier but also more pleasant to learn something about summarizing. 



3.4.5 Interpersonal and situational differences in importance ratings 

Interpersonal differences. So far, it may have looked as if all people were 
equal with respect to their judgements of importance. This is of course not the 
case. One sort of interpersonal difference can be seen from experiments with 
good and poor readers reported by WIN084. One of his test texts deals with the 
problems of American cities in the 19th century (see Fig. 3.19). It illustrates 
the significantly different importance selection pattern of fluent and less fluent 
readers in the eighth grade. Sentences in bold italics were selected by signifi- 
cantly (p<0.05) more poor readers than by good readers or adults. Sentences in 
boldface were selected as important by significantly (p<0.05) more good rea- 
ders and adults than poor readers. 

WIN084 finds that good readers are better judges of importance than poor 
readers when importance is defined in adult terms as textual importance. Poor 
readers tend to select sentences as important if they are rich in visual and con- 
crete detail. They choose as important information of high personal interest, 
not the kinds of information the author staged as more important in the pas- 
sage. Whereas good readers put the sentences which they have judged impor- 




92 



3 Summarizing in Everyday Communication 



tant in the summary, poor readers exhibit much less consistency between what 
they rate as important and what they include in their summaries. We have to 
conclude that the skill of importance rating depends on the general intellectual 
faculties of the person, that people with more developed cognitive skills obey 
different standards than intellectually less proficient individuals, and that the 
central role of importance judgements in summarizing is not self-evident to 
everybody. Reversing the argument, we may think about tailoring summaries to 
recipients who need much imagery and action, in order to allow them to grasp 
their part of the discourse content, and other summaries to those who work with 
the more abstract insights conveyed by an article such as that about the 
problems of American cities in the 19th century. 



Cities in the 1800s 

In the last years of the 1800s, cities in the United States were growing 
faster than anyone had ever dreamed was possible. But as the cities grew, so did 
the problems. 

One problem was slums, with crowded, dirty apartment buildings called tenements. In the 
slums, diseases spread quickly when people got sick. In crowded slums y people 
threw garbage out the windows, where it grew into huge heaps in the 
streets and alleys. Insects and rats in the garbage caused more sickness 
in the slums. 

With so many people in cities, garbage suddenly became a problem outside the slums. No 
one guessed that cities would ever have to find ways to collect the garbage. Why, even in 
New York, the biggest city in the country, garbage had always been eaten by pigs in the 
street. 

New buildings went up almost overnight. Many were poorly made and jammed close toge- 
ther. Most weaie made at least partly from wood. The danger of fire increased. Cities began to 
suffer from terrible fires that quickly burned down entire neighborhoods. Chicago had 
one of the worst fires. Most of the city was destroyed and hundreds of 
people were killed or hurt. If cities were going to be made safe, buildings had to be 
made better, and good fire d epartments were needed. 

Crime was another city problem. Oh, there had always been criminals. But like 
other people, criminals seemed to be especially attracted by the city. The only difference 
was that the criminals came for different reasons. Large numbers of people and businesses 
provided more targets for thieves. And great crowds made criminals hard to catch. Some- 
times a gang would take over a neighborhood in the city and even the 
police were afraid to go there. 

Nearly everyone could see that the new cities needed help. But many people believed 
that it was not the job of the city government to solve the new problems 
like slums or garbage or crime. 



Fig. 3.19. Cities in the 1800s - importance judgements of poor and good eighth- 
grade readers and adults 
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Differences in age and academic training are of course not the only interper- 
sonal differences affecting importance ratings. Other differences may result 
from prior knowledge about facts (when I ignore what an aardvark is, how can I 
assess the importance of propositions about aardvarks?) or beliefs (if I do not 
believe in ghosts, why bother assessing any statement about them? - they are 
all irrelevant), and perhaps many other interpersonal differences. People dis- 
agree in their judgements about importance for obvious and legitimate reasons. 
The more summaries are adapted to the respective user profile, the better they 
serve their users. 

Different situations - different interests and importance ratings. In com- 
munication, individuals take account of the situation creatively. Let us imag- 
ine an American lawyer. During daytime at the office, she judges importance 
in her clients’ lawsuits according to the standards of US law, defining the im- 
portant features with respect to the case method (see SUTT94): 

The basic pattern of legal reasoning is reasoning by example. It is reasoning 
from case to case. It is a three-step process described by the doctrine of prece- 
dents in which a proposition descriptive of the first case is made into a rule of 
law and then applied to the next similar situation. The steps are these: similar- 
ity is seen between cases; next the rule of law inherent in the first case is an- 
nounced; then the rule of law is made applicable to the second case. (LEVI49 
cited according to SUTT94) 

In this frame of reasoning, only what is related to a precedent is important and 
can help to win a lawsuit. What is not merits much less attention. A lawyer 
thinks of course in conformity with the mental model of US law that she shares 
with her colleagues. Now let the lawyer come home late in the afternoon. At 
home, one of her problems that day is finding out if it makes sense taking her 
12 year-old daughter to a performance of the Flying Dutchman. The mother 
summarizes and explains the plot from the viewpoint of a teenager. She talks 
about the wild Norwegian fjord landscape, the adventurous life of sailors in 
past centuries, and about the weird tale that a sailor could not get eternal rest 
unless a loving woman sacrificed herself for him as did Senta for the Flying 
Dutchman. On the stage, there would be real, huge ships, and sea waves, and 
marvellous choirs. What the mother highlights are the attractions of the Flying 
Dutchman from the horizon of a 12 year-old American, i.e., things that are im- 
portant with respect to the opera and for the addressee. The mother’s summary 
may not differ very much in style from a description of a TV movie. She leaves 
out what cannot be understood without a greater acquaintance with the history 
of ideas in Europe, she will probably not talk of Ahasver reincarnated in the 
Flying Dutchman, and so on. The frame of reference the mother uses in as- 
sessing what is important and interesting is completely different from what she 
used in her function as a lawyer by day. In both cases, importance judgements 
follow different avenues, but they are explainable within their respective situa- 
tions. 
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An exploratory analysis of the mental model of US attorneys in law is given by 
SUTT94; the mental maps of mothers convincing their daughters to come to 
the opera are left to the reader’s imagination. Here, the conclusion is that in- 
deed the same person can refer to very different frameworks of factual knowl- 
edge, responsibilities, beliefs, and perhaps other items (s)he is convinced of 
when deciding about importance, and that the frameworks may change dra- 
matically during a normal working day. Therefore it would be unwise to banish 
the influence of situations from the study of summarizing and of relevance as- 
sessment. 
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Fig. 4.1. Summarizing in a professional information environment 



Most people occasionally summarize information when they report about a 
movie or the discussions of a meeting. Students often summarize course mate- 
rials in order to understand and memorize them better. There are also profes- 
sions where summarization tasks occur without being regarded as main activi- 
ties, for instance in journalism. Reporters summarize, for instance, a parlia- 
mentary debate or the state of affairs in any other domain, say the financial 
situation of the national social security system. Researchers typically begin 
their papers by summarizing the state of knowledge, in order to make their own 
contribution more easily accessible to readers. In both professions it is ad- 
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vantageous to master the most important summarization techniques, but neither 
journalists nor researchers regard themselves as summarization professionals. 
Their main Job is to find and to transmit knowledge. Non-specialist summariz- 
ing like theirs has been treated in the previous chapter. 

There are careers, however, for which the summarization of knowledge is a 
core qualification. This is true of information officers (indexers and abstractors) 
who produce information records for users of information systems. At first 
glance, professional summarizers do nothing different from occasional sum- 
marizers: they reduce a large information unit so that only the most relevant 
points are retained. But on closer inspection, professional summarizers can 
summarize the same information with greater competence, speed, and quality 
than non-professionals, much as a group of professional actors may play the 
same piece as an amateur theater group, but with differences in competence 
and performance. It is plausible that more training in summarization makes it 
possible to achieve a better performance. In particular, professional sum- 
marizers are expected to have a command of more and better summarization 
techniques than persons who only occasionally happen to summarize a text. 

Figure 4.1 shows the environment of summarization professionals, that of in- 
formation systems. The information environment differs in some respects from 
the communication situations of summarizers which we discussed earlier: 

• Professional summarizers are provided with explicit summarization methods. 
One of the main differences between amateur and professional perform- 
ance lies in the application of well-developed summarization techniques. 
Therefore, knowledge of methods is ascribed to the professional sum- 
marizer in Fig. 4.1 A corpus of methods is available in the field. We as- 
sume that summarizers refer to it. Some of this is experiential knowledge, 
the fruit of training on the job. Other methods are taken from standards, 
textbooks, and guidelines. Whereas the occasional summarizer is not ex- 
pected to follow explicit summarization methods, the professional obvi- 
ously knows and applies many of them. 

• Professional summarization deals almost exclusively with professional and 
technical discourses. In professional environments, mostly expository texts 
are summarized, although action-oriented text types occur as well. Their 
content is professional or technical while presentation formats differ. Writ- 
ten text is frequent, but it is often accompanied by image information. 
Whereas the contents of everyday discourses are not presumed to be diffi- 
cult to understand, technical documents may be hard to grasp even for ex- 
perts in the field. Thus professional knowledge is a precondition for com- 
munication; both the summarizer and the summary user must have access 
to it. 

• For professional summarizers, efficiency and technical support is a major is- 
sue. Summarizing professionals do cost-intensive work when they abstract 
and index. Under pressure of their employers or on their own they strive to 
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be effective. Professionals may receive organizational and technical sup- 
port which is out of reach for most summarizers in everyday environments, 
particularly in the form of computerized equipment and intellectual tools 
such as guidelines and reference works. 

• A professional information environment is organized. It imposes professional 
roles. Whereas in everyday communication situations the communicators 
may change roles, this is normally not possible in a professional environ- 
ment. Summary producer and summary user might never communicate in a 
face-to-face direct style. Physical distance may preclude a real conversa- 
tion. More typically they communicate through technical media where the 
summarizer corresponds to the sender and the user receives a message, the 
summary. Apart from the communication conditions, the professional roles 
of both parties and their training differences also prevent turn taking as 
found in everyday communication. Summaries are produced for a more or 
less open market, the information system. Users are seen as information 
consumers and normally apply what they have learned to their professional 
tasks. Later, they may produce new source information in the form of 
memos or technical papers. Producing summaries for a not very well- 
known audience is more risky than summarizing for a communication 
partner who is present and can ask whenever (s)he feels that communica- 
tion fails. 

Like all communicators, professional summarizers react to the situation they 
are immersed in. In our case the information system is the working environment 
of the summarization professionals whose expertise is described in the 
following. The information system provides an organizational and technical 
framework for their work as communicators, establishing aims, activity spaces 
and limits. To safeguard understanding of the communication situation for all 
readers. Fig. 4.2 outlines a classical bibliographic information system. It may 
be realized by different and mixed technical implementations, integrating 
human and computer contributions, and different network sites. As the techni- 
cal possibilites of information systems develop, presentation of information 
units can become more varied. They may include all available media. 

The information system functions as a communication medium where those 
interested can gain an overview of the information on offer and choose items 
they are interested in for their current tasks. It presents information users with 
abridged representations (summaries) of the documents they may study in de- 
tail later. Thus it enables them to look at plenty of documents before restricting 
their interest to the most relevant ones. All these representations are pooled in 
the database, which is equipped with a retrieval mechanism (see Fig. 4.2). 
Since the representations are indexed according to the same system vocabulary 
as the statements of information demand, the retrieval mechanism can work by 
simply matching the demand representation (query) against the representations 
of available information units. Full-text search is currently offered. Without a 
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(desirable) full text system, only summaries are directly available from the 
information system. The short representations contain a code which guides 
users to the source document in an external library, computerized or not. 

Professional summarizers prepare abstracts and indexes to be included in the 
short document representation records in the information retrieval system: 

• Abstracts are summaries that adapt to the functionality of a particular sys- 
tem. Rules may prescribe the text length, the fonts, and other presentation 
details, forbidding items that cannot be processed. The abstract is meant to 
communicate the essential content of the document to users. 

• Indexing corresponds roughly to a summarization by catchwords in every- 
day communication. In information systems it is governed by its retrieval 
function. Thus we have system-generated constraints that may, for in- 
stance, prescribe the use of codified descriptors from the system vocabu- 
lary. Their meaning may be defined inside the system and differ from nor- 
mal use. 




Fig. 4.2. Organizational framework of a bibliographic information system (adapted 
from LANC91). The dotted box includes the contribution of summarizers. 



Both abstracting and indexing are regulated by standards and system-specific 
guidelines. It is a core qualification of a professional summarizer to comply 
with these prescriptions (see the methods mentioned in Fig. 4.1). 

In the following, we display the available opinions about the intellectual pro- 
cesses of professional summarizing, i.e., of abstracting and indexing. First, we 
review the cognitive approaches known to the author, then we change the 
information source and look at original data. 
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Many researchers have written about indexing and abstracting. There is good 
and recommendable recent work (e.g., LANG89, MOLI94, CREM96). Stan- 
dards are being revised (ANSI79) or are reasonably up-to-date (IS085 and 
DIN88), and international conferences discuss new developments in the field. 
However, from a cognitive science point of view, the published state of 
knowledge about professional summarizing has limitations. It emphasizes 
technical and external features of indexing and abstracting (the application of 
a certain thesaurus, the handling of abbreviations or author-provided abstracts, 
etc.) while the intellectual processes themselves - also known as “subject 
analysis” (LANG89) - are not considered very much, although they are at the 
core of the summarization processes: 

Subject analysis is the first, the most important and the most difficult part of 
all classification and indexing. No retrieval system can be better than the sub- 
ject analysis on which it is based. (...) In fact, the matter is usually approached 
indirectly and there has never been a previous volume devoted to it. (LANG89) 

Cognitive scientists must note the poor treatment of the respective cognitive 
processes. Their argumentation is supported by information scientists (JONE83, 
BEGH86, ROWL82/88, LANG89, MOLI94/95, and others). Cognitive science 
approaches may come nearer to the core of the summarization function. In this 
chapter, the reader is invited to look at professional summarization - abstract- 
ing and indexing - from a cognitive science point of view. Unless the cited 
authors insist, we do not distinguish between indexing and classification. 



4.2.1 Subprocesses of professional summarizing 

Cognitive processes run fast in human brains, and they have the tendency to be 
complicated. An established problem-solving strategy is to subdivide complex 
processes into simpler subprocesses or subtasks and to repeat this subdivision 
until we obtain processes which are simple enough for study. For abstracting 
and indexing processes, several authors have proposed broad subdivisions 
which list some half a dozen subtasks. Since cognitive processes are flexible 
and can simultaneously be organized according to multiple principles, there is 
nothing bad in conflicting structurations. The reader is therefore invited not to 
mind slight variations in the process organizations proposed by different 
authors. It is also normal to have separated observations about abstracting and 
indexing, reflecting the working situations where they occur. In many libraries, 
to mention only the clearest situational factor that distinguishes indexing and 
abstracting, books are indexed but never abstracted. So abstracting does not 
occur there. This book treats professional summarizing as a specialized form of 
everyday summarizing. If this assumption is true, the process structuration of 
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abstracting and indexing must correspond to the structuration of everyday 
summarization. Our readers are invited to compare the phases of abstracting 
and indexing processes indicated in the following to the structure of everyday 
summarization processes discussed at the beginning of Chap. 3, for instance by 
returning to Fig. 3.2. 

ROWL82/88 presents a five- step plan for abstracting which is meant more as 
a guide for novices than as a description of expert behavior: 

• Step 1: Read the document with a view to gaining an understanding of its 
content and an appreciation of its scope. 

• Step 2: Make written notes of the main points in the document. Steps 1 and 
2 may be completed simultaneously, or Step 2 may be conducted during a 
re-reading of the document. 

• Step 3: Draft a rough abstract from notes recorded in Step 2. It is important 
not to transfer verbose words or phrases from the original and to heed the 
points of good style. 

• Step 4: Check the draft abstract for punctuation, spelling, accuracy, omis- 
sions, and conciseness. When all necessary amendments have been spot- 
ted, edit the draft abstract and make any improvements to the style that 
are possible. 

• Step 5: Write the final abstract. 

While ROW82/88 subdivides the indexing process into the tasks of familiariza- 
tion, analysis and conversion of concepts to index terms, LANG89 subdivides 
the indexing/classification process into the three subprocesses subject analysis, 
translation into the indexing language and construction of a register entry. 
HOVI89 investigated indexing and classification processes performed by Fin- 
nish library assistants and students of information sciences. She records them 
with the help of thinking-aloud protocols and arrives at a slightly different sub- 
division: 

• Scanning the document and establishing the text meaning. In order to estab- 
lish what is in the document, the indexer normally only reads it in part. 
Checklists list those parts of the text that should be looked at. As time is 
usually limited, a complete reading is out of the question. 

• Identifying the most important concepts of the text topic. The concepts to be 
indexed are not only oriented towards the text meaning, but also towards 
the interests of the users of the information system for whom the document 
is represented. Usually, rough standards exist for the number of descriptors 
to be assigned (indexing depth). The thesaurus helps with determining 
relevant concepts. In addition, the feedback from user enquiries is also 
taken into consideration. 
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• Representing the selected concepts in the indexing language. In the next 
step, the terms deemed to be important are represented with expressions 
from the indexing language. To this end, the respective descriptors have to 
be looked up in the thesaurus, checked for their applicability, and written 
down. As the indexing language characterizes the information interests of 
the system users, it is normal that not all important concepts from docu- 
ments can be represented with descriptors. 

LANG89 differs from other authors because he devotes his attention to the first 
and cognitively most demanding subprocess, subject analysis. He describes the 
working process of subject analysis from a real-professional-life attitude: in- 
formation systems should convey knowledge from documents to those request- 
ing the information. It must therefore first be established what knowledge we 
are dealing with in the respective case. This is subject analysis. The procedure 
amounts to summarization of the document meaning in a core statement. The 
summarizing begins with the title. The indexer extends the cues from the title 
by picking up concepts from the table of contents or the headings of the docu- 
ment. After that (s)he should read the introduction, as this contains the most 
explicit statements about the author’s intentions. Sometimes it is necessary to 
read selectively from the individual chapters in order to verify the assumptions 
set up. It is rarely necessary or feasible to read the entire text of the document. 
Often it is more practical to look for a criticism. The topic statement resulting 
from the subject analysis summarizes the knowledge about the document 
meaning, insofar as the indexer has deemed it important. When it comes to un- 
derstanding the document and formulating a topic statement, the general cate- 
gories of good classifications are useful. For example, the eight categories that 
determine the order of quotation in the Bliss classification (BC2) are helpful. 

The phases given by CREM96 follow below. 



4.2.2 Cognitive science accounts of abstracting 

It is good, but not enough, to have a cognitive working process subdivided into 
a handful of phases and to characterize these subprocesses by a short descrip- 
tion, as done by the researchers cited above. A cognitive scientist must ask for 
more precision and more detail, aiming at a degree of scientific explanation 
that, for example, makes a computer simulation possible. 

More detail about abstracting is given in CREM82/96. The author can build 
on the state of knowledge that was elaborated by BORK75. By falling back on 
his own expert competence, CREM82 pioneers a cognitive science study of 
abstracting which outdoes all earlier approaches in concrete abstracting know- 
how. He explains his work. He formulates his procedure rules. He provides an 
overall organization of the work process. He puts appropriate emphasis on the 
reading strategies of the abstractor. The focus of his description is on the intel- 
lectual activities with which he acquires the contents of the abstract. 
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Table 4.1. Stages of the human abstracting process (from CREM96) 



Stages 


Techniques 


Results 


1 Focusing on the basic features of 
the materials to be abstracted 


Classifying the form and content 
of the materials 


Determination of the type of 
abstract to be written, the relative 
length, and the degree of 
difficulty 


2 Ideaitifying relevant information 
(sometimes done simultaneously 
with stage 1) 


(a) Searching for cue or function 
words and phrases, structural 
headings and subheadings, and 
topic sentences; 

(b) expanding the search based on 
^e results of (a) 


Identification of a rqires^tative 
amount of relevant information 
for extraction 


3 Extracting, organizing, and redu- 
cing the relevant information 


Organizing and writing the 
extracted relevant information into 
an abstract, using a standard format 


Prqiaration of a concise, unified, 
but unedited abstract (see stage 
3 abstract) 


4 Refining the relevant informa- 
tion 


Editing or review of the abstract 
by the originator or editorial or 
technical reviewers 


Completion of a good informa- 
tive or indicative abstract (see 
stage 4 abstract) 


Stage 3 sample abstract 

Every cognitive skill draws upon part of the brain’s extensive repertoire of representational 
subsystems, storage mechanisms, and processes. This tutorial article is an introduction to research 
exploring these basic components of cognitive skill and their organization. Four areas of research 
are reviewed: the perception of objects and words; the distinction between short- and long-term 
memory mechanisms; the retrieval of remembered episodes and facts; and attention, performance, 
and consciousness. 

Stage 4 sample abstract (edited version of stage 3 abstract) 

Research on the cognitive representational subsystems, storage mechanisms, and processes of 
the brain is review^ tutorially. Studies of (1) the perception of objects and words; (2) short- 
and long-term memory; (3) the retrieval of remembered episodes and facts; and (4) attention, 
performance, and consciousness are described. 



CREM82/96 explains how one can construct well-structured abstracts. To this 
end, he determines four “approximatively” defined stages of the abstracting 
process (see Table 4.1) and describes them as follows: 

• Focusing on the document structure. In the first stage, the abstractor deter- 
mines the most important characteristics of the document: the type of 
document (monograph, dissertation, project report, essay, etc.), because 
this helps to determine the type of representation (epidemiological or so- 
ciological surveys, descriptions of methods or appliances, theoretical stud- 
ies, mathematical models, literature surveys, etc.) and the subsequent pro- 
cedure. Then he investigates whether the text is subdivided. Of particular 
interest are headings such as “Introduction”, “Methods”, “Results”, which 
help with the finding of relevant information. Further questions: Are the re- 
sults clearly recognizable? Do they come together or are they scattered? 
Are special methods described? 

• Identifying relevant information. In a rapid reading process, those text pas- 
sages are identified that contain potentially relevant material for the ab- 
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stract. Abstractors orient themselves towards headings, which indicate the 
role of a chapter (“Results”, “Introduction”) and key phrases in the text 
(“In this paper we...”, “Results suggest...”). The position in the paragraph is 
also taken into consideration: the first sentence is often a topic sentence, 
the last sentence often summarizes the paragraph. 

• Extracting, organizing and reducing the relevant information. Once the most 
important information has been identified, the abstractor extracts it from 
the original document and orders it in a mental grid that corresponds to a 
standard format: purpose of the investigation, methods, findings, conclu- 
sions, recommendations. Alternatively, he uses another schema for ab- 
stracts. When writing, he checks and reduces the contents. 

• Refining the information. In the last work stage of abstracting, the rough ab- 
stract is edited and reviewed. 

Table 4.2 lists the reading strategies given by the author and abstractor 
CREM82/96. Curious readers can check their empirical value by comparing 
them with the strategies of the expert Edward and his colleagues in the 
intellectual toolbox of professional summarizers (see appendix of this chapter). 



Table 4.2. An abstractor’s reading strategies (CREM96) 



General reading rules for abstracting 


Retrieval reading 


Rule 1 Read actively to identify information for 
the abstract and passively for understand- 

Rule 2 Read with standard rules and con- 
ventions and special instructions for 
writing abstracts in mind. 

Rule 3 Read attentively through the full ab- 
stracting process of reading, thinking, 
writing, self-editing and revising. 


Rule 1 Scan exploratively the text of the 
material to be abstracted to identify 
passages containing information having 
potential for retrieval for inclusion in 
the abstract. 

Rule 2 While scanning, mentally or in the 

margin of the copy, note those parts of 
the material that contain information on 
purpose, methods, findings, or con- 
clusions and recommendations. 



In his examples in Fig. 4.3 CREM82/96 puts forward a generally applied strat- 
egy for a true condensation of texts, i.e., for formulating texts more densely 
with no or very little loss of information. The checklist for reviewing during 
abstracting (see Table 4.3) contains valuable items of know-how about 
concrete activities that an abstractor carries out. From a cognitive scientist’s 
viewpoint, some of these subtasks are easy to imagine, e.g., the checks for cor- 
rectness of punctuation or format. Other activities, such as eliminating awk- 
wardness or emphasizing main ideas, need further operationalization before the 
cognitive program behind them can be outlined. 
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Fig. 4.3. Condensing information (adapted from CREM96) 



Table 4.3. An abstractor’s checklist (from CREM96) 



Check for completeness. 

Check for accuracy. 

Check for unity and coherence. 

Achieve effective transition. 

Check for consistent point of view. 
Emphasize main ideas. 

Subordinate less important ideas. 

Check for clarity. 

Eliminate ambiguity. 

Check for appropriate word choice. 
Eliminate afectation and jargon. 

Replace abstract words with concrete 
words. 

Achieve conciseness. 

Make writing active (voice). 

Change negative writing to positive writing. 
Check for parallel structure. 



Check sentence construction and achieve 
sentence variety. 

Eliminate awkwardness. 

Check for appropriate tone. 

Eliminate problems of grammar. 

Eliminate sentence faults. 

Check for agreeement. 

Check for proper case. 

Check for clear reference of pronouns. 

Eliminate dangling modifiers and misplaced 
modifiers. 

Check for correct punctuation. 

Check for mechanics: spelling, abbrevia- 
tion, capital letters, contractions, 
italics, numbers, symbols, syllabi- 
fication. 

Check for correctness of format. 



In abstracting, document architectures are an important source of knowledge. 
As described above, an experienced professional summarizer is familiar with 
the document types of her or his domain. The summarizer concentrates on the 
document type during the very first moments of work. Then (s)he roughly 
knows the document architecture and can guide the further treatment of the 
document. In particular, the source document structure determines the structure 
of the abstract. Furthermore, it tells the abstractor from which parts of the 
document (s)he should extract information. The role of document structure 
knowledge for the abstracting process has been shown empirically by LIDD91 
for the most well-known abstract structure, that of empirical-experimental 
studies. Document and abstract structures as shown in Fig. 4.4 exist as knowl- 
edge structures in the minds of abstractors of various information systems. The 
text structures of scientific essays are, incidentally, similarly well anchored in 
the minds of practiced readers (DILL91). 
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background 



RELATION TO OTHER 
RESEARCH 

— new terms defined 
— institution 

— administrators 



I — location of study 



* PURPOSE 



|— * HYPOTHESIS - 
— research questions 
— RESEARCH TOPIC 
r— * SUBJECTS 



— number of experiments 



— time frame 



* METHODOLOGY — 



— PROCEDURES 

— DATA COLLECTION 



c 

c 

c 



independent variable 
dependent variable 



SAMPLE SELECTION 
control population 



CONDITIONS 

materials 



' — data analysis 



* RESULTS 




reliability 

DISCUSSION 



I — significance of results 



* CONCLUSIONS 



IMPLICATIONS 



— practical applications 



< — future research needs 



appendices 




* REFERENCES 
tables 



c 



unique features 
limitations 



*: Prototypical component 
UPPERCASE: Typical component 
lowercase: Elaborated component 



Fig. 4.4. Abstract structure for empirical and experimental research (adapted from 
LIDD91) 



MOLI95 outlines a broad structure and first ideas of an empirical model that 
“would provide information science with a wide and practical explanation of 
document representation as far as content is concerned”. She thinks that a 
more detailed and reliable operational model is still to be defined. 

In MOLI94 and MOLI95, abstracting is put into an interdisciplinary perspec- 
tive that draws in particular upon cognitive science and logical and linguistic 
approaches. The author wants to clarify our understanding by means of a scien- 
tific interpretative-selective model. In Fig. 4.5 readers of the present book meet 
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familiar concepts. Interestingly, the author presents the abstracting process as a 
process of reproductive writing, following the classical three-stages 
organization of writing models. Her structuring shown in Fig. 4.5 does not insist 
on the meaning reduction which is so characteristic of summarizing. Instead it 
is, for example, capable of housing the production of a review article as well. 



preanalysis analysis synthesis 




Fig. 4.5. A holistic account of abstracting stages (adapted from MOLI95) 




Fig. 4.6. The four key steps of abstracting (adapted from MOLI95) 



In a concurrent alternative structure of the abstracting process (see Fig. 4.6), 
MOLI95 splits the interpretation task into a first interpretation serving compre- 
hension, an information selection activity inspired by user needs, and a rein- 
terpretation leading to synthesis, i.e., to the construction of the target summary. 
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Three units of the documentary environment influence the abstract production 
process: 

• the abstractor 

• the source text 

• user needs 

Among these three entities, the user needs are most elusive and indirect in 
their influence, whereas abstractor and source text are a concrete person and 
object, respectively. 



4.2.3 Conceptual models of indexing and classifying 

In comparison to research about cognitive processing during abstracting, the 
cognitive modeling of indexing and classifying is still more in the phase of 
explorative idea generation and has proceeded less far towards empirical in- 
vestigation and empirically founded statements. 

FARR91 adapts the KINT83 model of text processing, text understanding, 
and text summarizing first to describe the techniques of abstracting, indexing, 
and classifying. He then concentrates on describing the indexing process, 
building a model on analogies and observations from the literature. This means 
that he presents an idea sketch of professional summarizing which is a pre- 
liminary stage of a cognitive model, but not yet a model that gives a precise 
description. Consequently, the author encourages empirical validation. 
According to FARR91, the indexer needs five kinds of expertise: 

• about the specialized field in which (s)he is indexing 

• about the document structures 

• about the systems with which (s)he is working 

• about the users of the system 

• general domain knowledge 

FARR91 first of all distinguishes a model for text understanding and for text 
production. However, he points out that in the case of indexing, classifying, 
and abstracting, a text is always assimilated in relation to a production goal, 
i.e., that the understanding process cannot be seen independently of the pro- 
duction process. He envisages one integrated process of text understanding, 
whereas he considers three different production models for abstracting, index- 
ing, and classifying, respectively. 

The author devotes most attention to the activity of scanning, or searching 
for information. In the case of abstracting and indexing, this replaces normal 
text understanding. He reports about rapid reading methods, but comes to the 
final conclusion that in abstracting and indexing a text is not only read or 
glanced through. Instead it is scanned. Indexers and abstractors assimilate the 
document selectively according to special features. The features that recom- 
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mend a text for assimilation may be typographical. The indexer and abstractor 
reacts to words, headings, and paragraph beginnings in italic type. Verbal key 
phrases (“Results suggest...”) also play a role, as do words which occur fre- 
quently and expressions belonging to certain semantic fields. Topic sentences 
receive particular attention, as well as those parts of the document that contain 
the necessary information for indexing and abstracting. Part of the expertise of 
a professional indexer and abstractor is a trained eye for document structures. 
In this context, the overall structure of the text is of interest, as well as those 
parts of the text on which the indexing or abstract can be based: introductory 
chapters are important, initial paragraphs within them, and the first sentences 
of paragraphs. These selective search skills are the result of many years of 
practice. 

The mental model or situational model in the sense of KINT83 appears ac- 
cording to FARR91 as the “aboutness” model, i.e., the thematic core structure 
or macrostructure of the document. At the beginning of processing it only en- 
compasses the heading. It is gradually extended. The aboutness model is the 
basis for the production of the abstract, the indexing, and the classification. 

According to FARR91, the following types of operation take place in long- 
term memory during the formation of indexing terms: 

• Strengthening: a potential index term is consistent with the global docu- 
ment theme or matches one that has already occurred and reinforces it. 

• Modification: an index term is modified so that it corresponds better to the 
goals of the indexer or an already familiar index term. 

• Aggregation: (“chunking”): a group of semantically related index terms 
are bundled together under one single term. 

• Rejection: a potential index term contradicts the document theme, the fac- 
tual knowledge, or the goals of the indexer and is discarded. 

The abstract production model produces a prose statement about the document 
topic, in the case of informative abstracts also about methods, conclusions, and 
analyses from the original document. This requires a deeper semantic process- 
ing than for indexing or classification. 

The indexing model produces indexing terms that indicate the topics of the 
document. In the case of indexing it is important to keep to the controlled tar- 
get vocabulary. A further restriction in expression arises from the indexing 
depth. Potential index terms are compared to the target vocabulary and modi- 
fied if the envisaged term does not exactly correspond to it. 

The classification model produces a coded representation of the global 
“aboutness” of the document. The document topic model is compared to the 
tables of the classification model. Changes become necessary if the document 
topic cannot be expressed with the help of the classification system. 
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The indexing model of FARR91 emphasizes important aspects of the indexing 
process: 

• the expertise needed for abstracting and indexing 

• the professional method of searching for information, which replaces nor- 
mal text understanding 

• the common basis of text summarizing, which links abstracting, indexing, 
and classification. 

Besides these, the author lists many interesting details about indexing and ab- 
stracting which are gleaned from the literature. 

In order to clarify the cognitive act of classification, BEGH86 relies on text- 
theoretical assumptions (DIJK80, KINT78, KINT83). She follows the argumen- 
tation of HUTC77 and explains the aboutness of a document through its macro- 
structure. If the macrostructure has been built up in the interaction between 
data- and expectation-driven processing, then the aboutness of a document is 
known. In the subsequent classification activity, there are always two texts to 
be considered: the source document and the classification system, whose text 
meaning structure will prevail in the target text because the classification sys- 
tem determines the representation possibilities. Consequently, classifying 
processes work with intertextual relations between the document and the clas- 
sification system. A cognitive model for the classification process must there- 
fore explain how the expert one step after the other: 

• infers a deep semantic representation in the form of propositions from the 
text surface of the original document, 

• acquires the semantic propositional representation of the classification sys- 
tem from its surface structure, 

• establishes a relation between the two representations, whereby the classi- 
fication system with the appropriate rules serves as an artificial system for 
expressing linguistically represented document contents, 

• converts the propositional structure, which has resulted from applying the 
classification system to the document, into a code that corresponds to the 
classification system. 

By carrying out these steps, the indexer files the document in a class with 
similar ones. (S)he sets up two groups of intertextual relations: 

• between the document and the classification system, 

• between the document and neighboring documents in the same class of the 
system. 

BEGH86 proposes checking her framework model empirically. 
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4.3 An empirical cognitive model of professional summarizing 



The work reported above provides a broad description of the thinking processes 
in abstracting and indexing in terms of process stages, interspersed with some 
observations regarding details. The stages, steps, subtasks, or individual tech- 
niques are given a verbal description. Beyond the very broad characterization 
of process stages, almost no coherence is built up between pieces of knowl- 
edge assembled from different sources. CREMM82/96 is the pioneer of a cog- 
nitive psychology approach to abstracting. BEGH86 and FARR91 put their con- 
siderations into a more general theoretical framework and refer to the discourse 
processing model proposed by van Dijk and Kintsch (best described in 
KINT83). MOLI94/95 is the most active champion of an interdisciplinary ap- 
proach. A more complete, more precise, empirically more valid, and theoreti- 
cally better founded description is still missing and advocated. 

In order to appreciate the research situation, readers are invited to compare 
what they learned about everyday summarizing in Chap. 3 and the results of 
abstracting and indexing research that are presented in the first section of this 
chapter. The types of theories that we have for everyday summarizing are 
lacking for professional summarizing, as can be seen from the sketchy and 
global quality of the reported abstracting and indexing models. The author of 
this book decided that professional summarizing was at least as interesting as 
everyday summarizing and deserved to be explained as carefully. An even 
more thorough investigation seemed to be called for, given that professional 
summarizing is done by experts and therefore possibly involves more skilled 
cognitive work than occasional summarizing by laypersons. At least within the 
perspective of computerized summarization, we must be more curious about 
expert summarizing than about the performance of less trained people. In the 
following, readers are presented with the outcome of this investigation. The 
presentation style changes from a state-of-the-art report or parade of ap- 
proaches to the much more unified account of an empirical study which ends 
with a computer model. 

From the discussion of cognitive processes and everyday summarization in 
the previous chapters, readers are already advanced on their way to more 
knowledge about professional summarizing. Knowing that expert summarizing 
is a special case of summarizing we are justified in transferring factual knowl- 
edge, research standards, and methods from general research about summariza- 
tion and cognitive processes to the less explored domain of professional sum- 
marizing. The target domain is not easy, therefore we have to take across con- 
siderable theoretical and methodological luggage. It is surveyed in the follow- 
ing. General principles are explained first. Then, we adopt a knowledge engi- 
neering perspective and cover the whole way from domain data to an imple- 
mented expert simulation system (SimSum - see the CD accompanying this 
book). After that we concentrate on the empirical model of professional sum- 
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marizing which is discussed in the rest of this chapter, first expounding in brief 
the research methods and then the results. 

Results are depicted first in general terms and then demonstrated by se- 
quences from observed working processes. 

A cognitive model is a suitable means to formulate a more detailed insight 
into professional summarizing, i.e., abstracting and indexing. It describes the 
problem-solving behavior of summarization experts. It answers the question of 
what a summarization expert does to achieve a summary and what knowledge 
(s)he applies in doing so. Developing such a detailed empirical model requires 
a major effort combining methods from social sciences, cognitive psychology, 
linguistics, and knowledge processing. Since the methods shape the resulting 
model, it is important to have some familiarity with them in order to under- 
stand the summarizing model produced by them. 

The question of how experts solve problems is known from knowledge engi- 
neering, which is part of the expert system methodology. Whereas most expert 
systems aim at doing the same job as an expert without necessarily using the 
same methods as the human experts, we must think of an expert simulation 
system as a special type of expert system. Such a system may also have the 
main aim of explaining human expert performance. While the general frame- 
work of expert systems development is applicable, an expert simulation model 
requires stronger empirical evidence. Hence we must invest more empirical 
effort to discern a detailed image of expert performance. 

What is needed, first of all, is an explanation model which suffices for a per- 
son to perform summarization. Fortunately, this does not imply a complete de- 
scription or theory of the cognitive processes during summarization. What we 
need is a so-called naive model (or theory) of the process (NORM83), or a 
subjective theory (GROEB88). Models and theories of this type allow a subject 
to act purposefully in a domain, without claiming to tell the definite full truth 
about it. Instead they tell us enough and are close enough to the truth to enable 
successful action. In particular, they refer to the limited knowledge of layper- 
sons in everyday situations and not to scientific knowledge. The reader may 
compare naive subjective models and theories to the knowledge which allows 
him or her to use an appliance such as an iron. In order to iron a skirt, it is not 
necessary to have a complete model of the iron in mind, but one must know 
enough to apply it successfully. In routine contexts, even domain experts may 
prefer naive theories to the much more sophisticated scientific ones: when 
switching the light on, a physicist will not activate her or his extended knowl- 
edge about electricity, but simply think that the switch turns the light on and 
off 

When producing summaries as a routine task, an expert may very well rely 
on naive theories as long as they work alright, and refer to scientific reasoning 
only in the case of problems. (S)he is assumed to apply common and expert 
summarization techniques. Methods learned at highschool and during aca- 
demic education may mix with expert techniques learned on the job. 
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The empirical model of professional summarizing presented in the following is 
a grounded theory (GLAS80, STRA87), the outcome of qualitative field re- 
search (MAYR90) or naturalistic enquiry (LINC85). The stance of grounded 
theories is that meaningful theories about social phenomena, which are often 
complex by nature and poorly understood, can be obtained inductively by sys- 
tematic data analysis. As humans react to their environment, the researcher 
has to study their behavior in its natural setting, accepting the heavy burden of 
field research instead of laboratory tests. The initial knowledge about the do- 
main is expanded incrementally by new concepts developed from empirical 
observation. Whether the model is valid is shown predominantely according to 
the coherence principle of truth. The more comprehensive and complex a net- 
work of statements is, the stronger are the constraints it imposes on new hy- 
potheses. Since every hypothesis has to stand the pressure of many other ob- 
served facts, the survival chances of wrong hypotheses are indeed low. 

The investigation of cognitive processes such as summarizing needs special 
methods which are called introspective because they presuppose a person who 
reports what happens in her or his mind. A detailed discussion of introspective 
data and methods is given in ERIC80/84. In particular, thinking-aloud protocols 
are applicable for learning about summarization processes. Obeying the basic 
thinking-aloud instruction, the subject talks about all operations during task 
execution and thus records all states of the process. We obtain rich intro- 
spective data, but the record is also prone to gaps and distortions. Interpreta- 
tion therefore draws upon additional evidence from input, output, and other 
sources in order to reconstruct a more complete image of the cognitive activi- 
ties from their traces in the thinking-aloud protocol. Readers may skip to the 
segments of real summarizing processes (Sect. 4.3.6) to see what thinking- 
aloud protocols look like. They figure in the protocol segment box in the mid- 
dle of the exhibits. 

For interpreting introspective data about such complicated processes as 
summarizing, we are lost without a sophisticated interpretation framework 
based on preexisting theories or models. Fortunately, summarizing is a part of 
discourse processing. The cognitive model of discourse processing proposed by 
Kintsch and van Dijk (KINT83 - for details see Sect. 3.2) can therefore guide 
the interpretation of summarization process records. It provides an idea sketch 
of overall summarization processes and of specific strategies, based on ex- 
perimental results and theoretical considerations. Besides the discourse pro- 
cessing model, the writing model for expository texts developed by HAYE80 
applies, since summary writing is a variation of expository writing. Whereas 
the KINT83 model is not very explicit about process organization, the writing 
model states stages of the process and broad hypotheses about their internal 
structure. 
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4.3.1 The path from summarization practice to its computer simulation 

For further understanding we have to look at the role of empirical methods in 
knowledge engineering. They are needed in order to learn or model the meth- 
ods of experts before we can transmit them to a system. Then, the research 
procedure advances to an implemented simulation, in our case to the SimSum 
system. It demonstrates with a limited quantity of original data how cognitive 
subprocesses (“agents”) cooperate to reach summarization goals. 

Let us recall that the implemented simulation serves scientific and presenta- 
tional purposes: 

• A computer model presupposes a conceptual model which is detailed and 
precise enough to be implementable. Thus a computer simulation demon- 
strates a quality of the theory that complies with this demand. It checks 
and corrects the empirical model. 

• A computerized presentation system makes it easier to follow complicated 
processes, because the system at every moment shows the information 
which is useful to understand the current situation, rearranging the presen- 
tation as often as necessary. In contrast to readers, system users profit from 
the simulation effect as known from flight simulators. They are immersed 
in the summarizing process and can concentrate on understanding the 
cognitive activity itself. 

Figure 4.7 shows the investigators’ path from summarization practice to Sim- 
Sum, the implemented expert simulation system that accompanies this book 
on CD-ROM. The overall research procedure is inspired by the KADS meth- 
odology of model-driven knowledge engineering (SCHR93). KADS divides the 
research process into three phases. It is up to the researcher to choose the 
appropriate empirical, theoretical, and implementational tools which help to 
fight the way through the KADS modeling procedure. The SimSum system’s 
KADS procedure has been configured as follows: 

The empirical modeling phase results in a conceptual domain model. The 

first KADS modeling phase is strictly empirical. The researcher uses appropri- 
ate empirical methods and theories and strives for an empirically founded 
model whose statements are justified by observation. Where empirical evi- 
dence is weak or missing, the corresponding empirical model has unsafe areas 
or even gaps. Theory validation follows empirical standards. 

To investigate professional summarizing, thinking-aloud protocols capture 
the data, a cognitive model of human text processing guides data interpreta- 
tion, and the empirical investigation plan follows the methodology of qualita- 
tive modeling and grounded theories (compare Sect. 4.3.2). 
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The system design phase starts with a conceptual model of the domain 
and reconstructs it with computer-oriented concepts and theories. Con- 
verting an empirical model into a computer-oriented design model implies a 
serious reinterpretation because computerizable theories are now used as an in- 
terpretation grid. Knowledge that they cannot represent is lost, and what re- 
mains is reshaped according to computer concepts and operationalized as a 
technical construct. Incomplete empirical evidence must be supplemented with 
constructed functional units. 

For its internal text meaning representation, SimSum has a common imple- 
mentable text representation working with frames and propositions. The cogni- 
tive strategies are projected to object-oriented agents in order to become 
software. They cooperate in a blackboard system architecture (ENGE88, 
CARV94): All agents write on and read from a communication area called a 
blackboard. Thus messages are distributed amongst all agents that may be con- 
cerned. 

An implemented system is designed definitively for users. In addition to the 
shifts caused by the technical support, it has to present its interpretation of 
facts for a particular audience. To this end, the implemented system must be 
complemented by a user access function. Its presentation of theoretical and 
empirical knowledge must be reinterpreted for ease of understanding in human- 
computer interaction. 



discourse representation object orientation 




Fig. 4.7. The SimSum KADS procedure of knowledge engineering 



From human strategies to cognitive and computational agents. Perhaps the most 
noticeable switch between the conceptual model of summarizing and the com- 
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puter design model used in SimSum concerns strategies and agents. The em- 
pirical tool metaphor for strategies is replaced by the cognitive and computer 
science agent metaphor, A tool metaphor is socially functional because it em- 
phasizes personal responsibility. Face to face with an expert, it is natural to 
consider her or him as a person who is in control and applies intellectual tools. 
However, this is not the whole truth of human cognitive organization. 

Physical tools convey their intentionality by their shape. For instance, a 
comb has teeth in order to separate fibers. In the same style an intellectual tool 
must conform to its aims. Thus a mental subtraction device is equivalent to a 
pocket calculator function. One thinking step further and we assign the 
subtraction skill of an individual to her or his mental subtraction device, i.e., 
we allocate competence to a decentralized intellectual device or program. 
This is supported by the well-known fact that the human mind indeed combines 
centralized and decentralized activities. Everybody knows it from their own 
experience: 

• Only in exceptional cases will the reader monitor the position of her or his 
left knee when reading these lines. In spite of this lack of attention, a sud- 
den movement when getting up because the phone rings will seldom end 
with a bruised knee, because a cognitive subunit accounts for the knee’s 
position and avoids false moves. 

• Memory shows some autonomy. Consider for instance the following: The 
name of an Italian village does not come to mind when needed. You move 
on and while you disregard the problem, hours later, memory comes up 
with the name while you concentrate on the enterprise’s bookkeeping. 
Without your awareness, perhaps against your conscious intention, mem- 
ory continued its search. 

The obvious conclusion is that our conscious mind is not in control of all men- 
tal activities. The contrary is true. Intellectual activities of different types may 
occur in parallel under mixed central and decentral control. This happens when 
your memory performs a background search for an Italian village while you 
estimate your company’s taxes. A useful and established metaphor calls the 
units of independent parallel performance intellectual agents or cognitive 
demons. As early as in 1959, Selfridge (SELF59) used cognitive demons to de- 
scribe pattern recognition. 

Where empirical evidence is missing, an empirical model of human cogni- 
tion can refrain from explaining how the subunits of intellectual performance 
interact. A cognitive science approach must explain their interaction - other- 
wise no virtual or real computer system will run. Since we mostly investigate 
subsystems of human cognition, the central control unit - the person’s con- 
scious awareness - is not present in the model. The control must be attributed 
to autonomous subunits that organize their cooperation during the execution of 
a task. The blackboard metaphor provides a model of interaction between cog- 
nitive or computational agents. 
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All this applies to the simulation of summarizing. We deal with a subsystem of 
human cognition, we have parallel processing (most obviously thinking and 
writing), and we must account for several processes that cooperate for instance 
when deciding about interesting material to look for in the input. Lacking a 
complete model of the expert’s person, there is nobody in a computer system 
to apply the empirical intellectual tools. So a reasonable solution is to engage 
units like cognitive agents to perform the human cognitive strategies. 

When it comes to an implementation, the agent metaphor reveals its affinity 
to an object-oriented programming style. This makes it even more attractive, 
without hiding the long-standing merits of the agent metaphor in cognitive sci- 
ence. To prevent a too superficial technical view of cognitive agents in Sim- 
Sum I include the drawing from LINDS 1 that decorated our door all the time of 
SimSum development. It shows Selfridge’s cognitive demons working at their 
blackboard (Fig. 4.8) 

The implementation phase transforms the implementable model into a 
software product. When transforming a system design model into an imple- 
mented system, it must again be reinterpreted, this time in order to implement 
it with given hardware and software tools. This means, for instance, imple- 
menting the concept of chance by using a random number generator. All fea- 
tures not implementable with available technology must be canceled. 

For SimSum, CLOS (Common Lisp Object-Oriented System) was chosen for 
implementing the cognitive agents. Multimedia tools make the cognitive 
process visible and provide a user access system. The simulation is presented 
on screen by the Macromedia Director. The medium for shipping the system is 
a CD-ROM. 



4.3.2 Setting up the empirical model 

Whereas above the knowledge engineering perspective integrated empirical 
modeling as a means to set up an expert system, an empirical description of 
summarizing is a perfectly valid main issue for a model and for a monograph. 
Only as a second priority does it matter whether it is presented by a computa- 
tional model or an empirical one. In such a view, a computer model is seen as 
a verification tool (see DIES71). It can help to check empirical research for 
completeness and applicability, it can stimulate new hypotheses, but it will 
not by itself prove the validity of an empirical theory. 

The following description of the empirical modeling methods briefly explains 
where the concepts of the summarization model came from, and what they 
look like. To give readers an orientation, we proceed from the general to the 
specific in characterizing the empirical methods, and talk about: 
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• the tenets of qualitative empirical field research, pinpointing the basic 
principles 

• the research procedure whose outcome is the summarization model 

• deriving new strategies from observation, which is here the central mo- 
ment of grounded theory development. 

Methodological principles of qualitative field research. In the following list 
of methodological principles (called in the original The 13 pillars of qualitative 
field research, compare Fig. 4.9), readers will notice practices such as the use 
of case studies, the interaction between the investigator and the domain, or in- 
trospection. They are characteristic for qualitative approaches, although less 
common in standard empirical research. For researching human behavior they 
are essential. For instance, to deal scientifically with introspection instead of 
banishing it is a precondition for investigating internal states of the mind (for 
more see ERIC80). 
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Fig. 4.9. The 13 piUars of qualitative field research (adapted from MAYR90) 



The tenets of qualitative empirical field research are given very briefly in the 
formulation of MAYR90 (for equivalent guidance in English see LINC85): 
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1. Case studies. During the research process it is necessary to document and 
analyze particular cases. They are essential for checking the adequacy of 
procedures and interpretations. 

2. Openness. The research process remains open to the domain of investiga- 
tion, such that the theoretical structuring, individual hypotheses, and the 
methodology can be revised as needed. 

3. Methods control. In spite of its openness, the research must follow a con- 
trolled procedure. The individual steps obey well-founded rules which are 
explained and documented. 

4. Prior knowledge. In the domain of social sciences the analysis is always 
molded by the analyst’s prior knowledge. Therefore her or his prior knowl- 
edge must be made explicit and developed under the influence of the in- 
vestigation domain. 

5. Introspection. Introspection is a valid source of information. 

6. Interaction between researcher and domain. Research is not seen as the reg- 
istration of so-called objective features of the domain, but as an interactive 
process during which both the researcher and the domain may develop and 
which allows for subjective interpretations to come up and to change. 

7. Holism. Functions and contexts of human life which have been modularized 
for investigation must be integrated again for a holistic interpretation. 

8. Historicity. Since all domains of the humanities have a history, the ap- 
proach to them must be predominantly historical. 

9. Problem orientation. Concrete and practical problems of the domain are 
preferred in research. They structure the scientific results. 

10. Generalization by argumentation. When generalizing results of research in 
the humanities, an explicit argumentation must explain which results can 
be generalized to which situations, domains, and time periods. 

11. Induction. In the social sciences, inductive methods for backing and gener- 
alizing results are central. They need control. 

12. Rule concept. In the humanities, regularities are better described by context- 
dependent rules than by general laws. 

13. Quantification. Qualitative investigations prepare meaningful quantifica- 
tions that may support the validation and generalization of the results. 

The research design. The summarization model discussed here results from 
qualitative field research. Figure 4.10 presents its research design. Its task is to 
reconstruct from observation a human knowledge processing system which 
generates the observational data, according to the line proposed by BERE87. 
At first glance the research plan does not differ very much from a standard em- 
pirical investigation procedure. We find the characteristic preparatory phase in- 
cluding steps 1-3, followed by the core phase of empirical modeling (steps 4- 
6), and an evaluation phase that applies the usual qualitative techniques, 
namely interpretation and triangulation (step 7). In contrast with standard em- 
pirical procedures, no overall feedback loop from results to hypothesis genera- 
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lion has been indicated. At the macro level, empirical modeling is indeed 
straightforward. Conceptual restructuring as well as hypothesis generation and 
test occur inside the steps indicated by boxes in Fig. 4.10. They are at the heart 
of empirical activity, producing there the characteristic cycles of grounded the- 
ory development. 




Fig. 4.10. Organization of the empirical investigation 



In the operational empirical phase (phase 4 in Fig. 4.10), six experts took 
thinking-aloud protocols of nine summarizing processes per person: three ex- 
amples of abstracting a short document (conference paper or journal article), 
abstracting a long document (report or monograph), and indexing and classify- 
ing a paper, respectively. All the experts remained in their natural working 
context. They used 52 different documents of their personal choice. The group 
was composed of persons who combined summarization expertise as abstrac- 




4.3 An empirical cognitive model 123 



tors, indexers, and technical editors with at least some experience in teaching. 
Teaching experience was required from experts to avoid a problem well known 
in the knowledge engineering literature: experts who can perform a task with- 
out the ability to explain it. An expert who has to transmit his or her intellec- 
tual skills to others must reflect on them and verbalize them. (S)he will be 
able to describe them during thinking aloud, too. 

Deriving new strategies from observation. To set up an empirical model of 
a domain, new concepts must be obtained by interpretation of domain data. In 
the end, they must make up a grounded theory of the domain. Since the model 
of summarizing is a process model, it is first and foremost constructed from 
process concepts, called strategies according to KINT83. They correspond to 
units of the human cognitive program for text processing as sketched by 
BERE87. 

For every expert an individual process model of summarization was elabo- 
rated. The data interpretation procedure was the same for all processes: 

• Transcription of the audiotape. A simple orthographic transcription was 
used, recording pauses and background noises, indicating deviations from 
standard English or German. Readers can inspect the style of transcription 
by looking at the protocol segments of Fig. 4.11 or in the working step ex- 
hibits below. 

• Subdivision of the working process into steps. Working steps are delimited 
periods of cognitive activity. They were identifed in in the thinking-aloud 
protocol by cognitive boundary signals expressed through interjections 
(“now let’s ...”, “ok”, “oops”, “next”, “finished”, etc.), pauses and ends of 
cognitive actions (input-output cycles, switches to new activities, dealing 
with new input items, etc.) 

• Interpretation in terms of the discourse processing model. The interpretation 
started out from the KINT83 framework and operationalized its strategies 
to a degree that a name, a functional definition, and application envi- 
ronments could be given (for a more detailed explanation see below). 

• Discussion of the interpretation with the expert, correction if needed. The 
preliminary results were discussed with the expert. This enhances the qual- 
ity of the research outcome. From an ethical viewpoint, too, the investiga- 
tor should submit her data for the test subjects’ approval. 

• Developing the model: organization and presentation of results. The result- 
ing strategies were organized hierarchically according to their functions. 
The ordered collection was dubbed the expert’s individual intellectual 
toolbox. To describe the structure of the working processes more globally, 
process overview plans were developed. Working sequences of general in- 
terest were chosen for studying the interaction of strategies. 
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Inductive concept formation works by abstracting from data, in the present 
case by defining strategies from the evidence given in a working step, under 
the constraints which are given by the theoretical framework, the whole of the 
observational data and the method of grounded theory development. 

Figure 4.11 illustrates how a protocol segment of a working step is inter- 
preted in order to find the cognitive strategies that account for the observed ac- 
tivities. For readers who dislike trusting strategies that come out of the dark, or 
who plan to work with thinking-aloud protocols it is useful to look in more 
detail at how strategies (i.e., constructs of the model) are derived from empiri- 
cal evidence. 

During interpretation the researcher assigns observed cognitive activities to 
strategies - recurring tasks of cognitive processing - until all activities are ex- 
plained. If a suitable strategy is missing, it is created. In our step Mackin-1 
(Fig. 4.11), only frequent strategies occur, for example: 

• start-explore. There is the beginning of a cognitive process, expressed by 
“so, reading introductory material”, i.e., the start signal and the following 
characterization of the started process. What is started is an exploration 
sequence made up predominantly of exploration steps. Starting exploration 
sequences is the task of the strategy named start-explore. We postulate 
that it is active because it is executed and reported in the thinking-aloud 
protocol. 

• explore. The current working step is dedicated to exploration. There must 
be a unit which directs the step to the right aim by steering the typical 
plan for exploration. This function is assigned to the step-leading strategy 
explore. It must be behind the scenes since we observe a successful ex- 
ploration step. 

• plan. After starting the working process, the summarizer announces what 
he is going to do now (“reading introductory material”). The strategy plan 
accounts for this unit of observed behavior. Since the summarizer shows a 
systematic and routine approach, the plan strategy is assumed to simply 
activate and verbalize the first item of a standard working plan in memory. 
Explaining the strategy thus allows its later implementation to become 
visible. 

• browse. The summarizer picks up information from the source paper. 
While thinking aloud, he reads it to the listeners. There is no evidence for 
any special constraints that restrict his information intentions, for instance 
to highlighted items. So we assume an open acquisition intention: the 
summarizer is reading for understanding. The strategy browse imposes the 
observed open acquisition style. Incidentally, modeling information acqui- 
sition intentions is necessary because they can vary with professional 
summarizers to a great extent. 
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• read-form. There must be a visual reading process, copying characters 
from the source document to the brain of the summarizer. Since there are 
no complications of information intake in our sample step, a basic reading 
functionality suffices. Therefore the read-form strategy is assumed to work. 
It recognizes characters and layout features (necessary for finding the be- 
ginning of the paragraph and the text) and copies them to a memory 
register. 

Readers may ask how the remaining strategies of the working step {by-form, 
top-level, first, unit, relevant-scheme, relevant-texthint, relevant-unit, hold, and 
underline) were discovered. They are kindly invited to try protocol interpreta- 
tion on their own by obtaining the respective strategy definitions from the in- 
tellectual toolbox (appended to this chapter), finding their empirical evidence 
in Fig. 4.11, and determining how the strategy is justified by its observable 
traces. Complete treatments of the Mackin-1 working step are given in the 
Mackin sequence below and in the SimSum system. 
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Fig. 4.11. Some strategies and their thinking-aloud evidence in working step 
Mackin-1. Arcs connect strategies to their home text passages. 




126 4 Professional Summarizing 



The role of constraints. As described above, grounded theories are con- 
structed inductively in a theoretical framework. Readers may fear that the re- 
sulting theories are no more than the investigator’s personal interpretations. 
This is not true, because concepts of a grounded theory are established under 
heavy constraints. Combined constraints largely preclude unfounded hypothe- 
ses, without excluding alternative interpretations. Besides being suggested by 
factual evidence, a newly constructed strategy must fulfill a set of restrictions 
(see Fig. 4.12 - the reading direction is counterclockwise). It must: 

• match the assumptions of the guiding text processing model If data force 
interpretations which are not covered by the research guiding model, and 
which cannot be changed, the model must be adapted. If it cannot be 
modified accordingly, the investigation has a serious flaw. 

• be established according to the rules of qualitative modeling methodology, 

• comply with the local data integrity conditions, i.e., input, output and mem- 
ory contents in the current step. If for instance observed output contradicts 
the assumption that a specific strategy has been active, the strategy must 
be retracted. 

• fit into the local processing integrity conditions, i.e., share the cognitive 
work with the other strategies of the step such that empirically observed 
behavior is achieved and the whole step can be reconstructed. 

• respect the conditions set by input, output, and memory contents during ear- 
lier and later working steps. 

• preserve the functional unity of the strategy in all working steps where the 
strategy occurs. The definition of the strategy is identical in all its con- 
texts. If a definition is functional in one step and is not in another, the re- 
searcher must find a better solution: split a strategy in two, change its ap- 
proach, make it more general or anything else that copes with all con- 
straints. 

• preserve dissimilarity from other strategies in the individual toolbox, avoid- 
ing an unjustified functional overlap. 

• respect iht functionality of the individual model in which it takes part. This 
condition would be violated for example by strategies that compete with 
others for the same task. 

• contribute to the functionality of the group model Strategies which occur 
twice under different names in different individual models are a frequent 
sin against this principle. They must be weeded out. 
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Fig. 4.12. Constraints on inductive strategy definition 



Every time a strategy that is well backed up by data contradicts the grounded 
theory established so far, the theory is restructured by changing all affected in- 
terpretations. Thus, the characteristic cycles of grounded theory development 
emerge. 

Some of the restrictions for strategies and theory are brought in by the case 
study technique. They become visible when the individual models are inte- 
grated into a group model. Then, strategies need correction if they do not com- 
ply with the functionality of the group model, e.g., if they present an uncon- 
trolled functional overlap with another expert’s strategy or an inability to coop- 
erate in some working step of some other expert’s individual model. It may 
also happen that individual models must be restructured in order to fit them 
into the classification scheme of the group toolbox. 



4.3.3 Global features of professional summarizing 

The process model of summarizing first and foremost describes the cognitive 
activities that take place during summarizing: assimilating information, select- 
ing, transforming, reformulating, and presenting. It explains the individual 
thinking procedures by reducing them to simpler, more transparent cognitive 
acts and by indicating their reasons and their effect on data. It focuses on con- 
crete information processing. Cognitive activities and structures of wider range 
have not been at the focus of primary observation. They are instead derived by 
abstraction from evidence inside working steps. 
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As is usual in grounded theories, the structure of the model is determined by 
factual knowledge about its domain. Its organization is shaped by the knowl- 
edge content as a bag is shaped by the goods it carries. Since it organizes ob- 
ject knowledge, the model also functions as a formal model. We will go 
through its organizational formal structures in detail, reviewing: 

• the grounding of the model in its domain 

• the characteristics of the summarization task family with the subtasks ab- 
stracting, indexing, and classifying, as well as co-occurring bibliographical 
description 

• global features of process organization 

• strategies 

• the intellectual toolbox (the repertoire of strategies) 

• the structure of working steps 

• task-oriented memory schemata 

• the basic grid of working processes 

• the design of real-world summarizing processes 

• working plans and subplans 

• professional summarization competence and general personal skills, i.e., 
the integration of expertise in the expert’s mind 

The grounding of the model in its domain. In the case study approach the 
researcher constructs the overall model in two stages. First (s)he constructs the 
case studies, which are here individual models of summarizing expertise, then 
(s)he sets up the group model integrating the individual models. Figure 4.13 
shows how every case study or individual model works like a probe in the 
domain of professional summarizing and how the overall model aggregates all 
findings. 

The six experts who have contributed their know-how to the model are inde- 
pendent personalities who are able to state the opinion of their field. They 
come from different disciplines: Harold Borko is a psychologist and informa- 
tion scientist, Edward Cremmins has an education as translator and technical 
editor, Ingetraud Dahlberg is a philosopher and information scientist, Andreas 
Gerards is a psychologist, Marliese Guenther is a biologist, and Hannelore 
Schott is a communication scientist. With one exception, the experts in the 
group have a mixed background of practical experience and teaching activi- 
ties. Half of the group has actively contributed to research in summarization. 
Two of the experts are US citizens. They participate in the American discus- 
sion and have limited ties to Europe. Among the four German experts, three are 
German counterparts of the US American experts. For them, international co- 
operation takes place through the participation of their institutions in interna- 
tional English-speaking information systems, but the anglophone influence in 
their everyday working life is limited. The fourth German expert is different. 
She is an international personality, the editor of an international journal, and 
thus works in English as well as in German. 
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The composition of the group emphasized diversity. By applying the case study 
methodology, individual profiles of the experts gave every chance for expres- 
sion of a potentially divergent professional practice. Although the group of 
summarization experts is not representative in the sense of a statistical sam- 
ple, the fact that they reach an interpersonal consensus in spite of a heteroge- 
neous environment is a good indication that the results approximate the truth 
about professional summarization. From a socio-professional view, six inde- 
pendent experts are able to define the opinion of the domain such that the re- 
sult would not change dramatically if other experts joined. 

Indeed, a substantial core of the observed strategies belong to knowledge of 
the whole group: 83 strategies are used by all experts of the group, 60 strate- 
gies are shared by five experts, another 62 strategies are common knowledge 
of four summarizing experts, 79 strategies belong to the repertory of three of 
the summarization experts, 101 strategies are used by two experts, and 167 
strategies are individual. Given the conditions of the experiment, this level of 
agreement cannot be due to the documents, because they were different for 
almost all working processes. 

There are no inexplicable differences in the number of strategies used (see 
Table 4.4). The summarizer Andreas with only 221 strategies is the fastest of 
the whole group. His aim is to be fast rather than sophisticated, which explains 
why he is restrictive in his intellectual tools. As he is so fast, observation time 
is shorter than with his colleagues, with the consequence that a smaller num- 
ber of his strategies may have come to the attention of the researcher. Edward 
is significantly above average with his 367 strategies. He is personally very in- 
volved in his work, develops more activities, and takes more time. 

Private strategies are strategies which have been observed with one individ- 
ual only. Among Andreas’ 20 private strategies, 11 can be explained by his 
tasks. He needs additional strategies because he is the only one of the experts 
to translate, and he has to sign his work and to comply with descriptive cata- 
loging specialties. He participates with some 200 strategies in the common 
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method pool and needs some additional 10 percent to adapt to local conditions 
with a minimum scope for personal freedom. There is nothing peculiar in such 
behavior. The same principle applies to all six experts. At the other end of the 
scale, we find Edward with 54 personal strategies. He uses 9 personal meta- 
cognitive strategies. Nobody will begrudge him for that. Another 9 strategies 
deal with writing and graphics, 2 strategies are needed to update his logbook, 
etc. The general conclusion is that all experts share common methods knowl- 
edge to a large degree, but that each one has, in addition, a small private 
stock of methods which serve purposes special to her or his tasks or reflect an 
individual working style. 



Table 4.4. Number of observed strategies per expert 
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Table 4.5. Number of strategies shared by pairs of experts 
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Table 4.5 lists the number of strategies used in common by two experts of the 
group. The general image is that everybody is linked to any colleague by a 
substantial number of common strategies (minimum 140, maximum 212). 
Since Edward contributes a high number of strategies of his own, he is able to 
share more of them with others than, for instance, Andreas, who himself is a 
thrifty strategy user. Between these two extremes, all experts have enough in 
common to set up a pool of shared competence. 

We conclude the following: 
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• Since the descriptive statistics of six very different experts do not differ 
too much, we may conjecture how a seventh expert would be integrated in 
the group. (S)he would contribute some 250 strategies and share some 90 
percent of them with at least one other, the remainder being personal. 

• Private strategies are limited in amount and scope. If experts had to fun- 
damentally adapt their intellectual tools to individual documents, we 
would expect to see many more private strategies - strategies which have 
been activated by a particular document treated by the respective sum- 
marizer. As this is not the case, we conclude that experts’ routine compe- 
tence is reasonably stable with respect to the demands of individual 
documents. A summarizing expert is not fazed by a routine document. 

• There is no danger that the common intellectual toolbox might grow un- 
controllably if more experts join the group, because the number of over- 
lapping strategies is high. The toolbox contains a common core and private 
spheres where everybody stores their own techniques. These must be sepa- 
rated from the common usage area because they risk multiplying with the 
size of the group. 

The summarization task family. Seen with a practitioner’s eyes, summariza- 
tion tasks are characterized by their outcome. In an information environment 
we have abstracting yielding abstracts, indexing bringing about indexing terms 
or descriptors, and classifying coming up with classification codes. 
Bibliographic description is often added to the duties of summarizers for 
practical reasons. Its results are cataloging data. 

The common core of summarization expertise is contained in the special 
preparations of summarization used in information environments. It comprises 
the strategies that contribute to all summarization products. As the outcomes of 
abstracting, indexing, and classification differ, a first guess is that the differ- 
ences between the three types of summarizing are first and foremost presenta- 
tional. The family resemblance must lie in techniques of information acquisi- 
tion, relevance assessment, and reduction to the essential meaning items. 

However, outcomes spread back to all the processes that try to reach them, 
not only to the presentational ones. Thus the information size of the summary 
may influence the whole summarization process: a relatively short summary in 
the form of an indexation needs less information than a textual abstract, so 
economical document exploration techniques may be used for indexing, while 
acquisition strategies for abstracting must dig deeper and unearth more rele- 
vant material. After this consideration we assume that the common core of 
summarizing indeed resides in information acquisition, relevance assessment, 
and the reductive reworking of the document meaning, but that it includes 
processing options for different sorts of target information, such as textual 
summaries, indexes, or other summary types which are not discussed here. 
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Information acquisition. When practitioners talk about the three professional 
types of summarizing mentioned above, information acquisition is often disre- 
garded although it constitutes a considerable common body of expertise for all 
summarizing tasks. Knowing how to determine the knowledge from the original 
that is to be represented in a summary (i.e., subject analysis) is a crucial sub- 
competence of a professional summarizer. Most of the effort in the summariz- 
ing process is spent on reading, though production activities like keying in a 
text may, by their greater visibility, push to the front. Corresponding to the 
cognitive effort invested, the information acquisition competence is rich in 
cognitive strategies. Readers may look in the intellectual toolbox to appreciate 
the number of strategies dedicated to information acquisition by the six con- 
tributing experts. 

Information acquisition including relevance assessment gives professional 
summarizing a stable common ground in methods, whereas the methods of in- 
formation presentation vary according to the target representations. 

Abstract production. An abstract is a short coherent text which has to inform a 
user about the essential knowledge conveyed by a document. While producing 
abstracts, professional summarizers are specialized writers. Typically they 
follow the practices of professional writing, taking notes, drafting and revising 
their target text. They apply strategies that are typical for discourse production, 
doing their best to create summaries that will be functional for their users. 
Among other things, abstracts cannot be useful without being understandable. 
However, their understandability is easily hampered by coherence gaps. After 
all, they have been put together from meaning scraps of the source document. 
Summarizers may expend considerable effort to upgrade the results of their ex- 
ploration, preventing them from being incoherent, misleading in their new con- 
text, expressed in the wrong nomenclature and so on. Reworking the for- 
mulations, summarizers comply with stylistic features as given in abstracting 
guidelines, making their abstracts concise, direct, loaded with data, free of 
jargon, etc. Although summaries are short and technical texts, summarizers 
have to pay attention to their communicative function and rhetorical balance. 
These examples suffice to explain that producing abstracts, even when more or 
less restricted to presentational issues, involves many cognitive activities. 
Indeed, the six experts used 103 task-oriented strategies for abstract production. 

Indexing. Indexing and classifying (i.e., indexing using the notations of a classi- 
fication scheme) are extreme forms of summarization. They yield summaries 
that consist of a chain of mostly nominal expressions. Concepts may be related 
by simple semantic links, but the relations are much weaker than the textual 
semantic relations that incorporate coherence in a textual summary. 

Indexing and classifying appear as special professional types of summariz- 
ing, which are more restricted in their requirements, are bound more strictly to 
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retrieval necessities, and use controlled vocabularies instead of natural lan- 
guage. The six experts of the group applied 46 special indexing strategies. 

Since indexing is bound to technical use in information retrieval much more 
directly than the production of textual summaries, indexers must respect a 
body of presentational prescriptions, often sticking to the descriptors of a given 
thesaurus. Thesaurus use and obedience to indexing rules presuppose additional 
technical skills which are most often visible in an indexer’s competence. 
Presenting an index with a standardized vocabulary and compliance with in- 
dexing rules involves additional intellectual work. The target vocabulary sets 
the conditions for expression, hence it is often the source of concepts which 
are assigned to the document, at the price of diverging from the author’s for- 
mulations and with the merit of normalizing concept presentation inside a 
given information system. 

An indexation should work as a retrieval tool in a specific retrieval environ- 
ment. Circumspect indexers tune their descriptors to be effective during search. 
Improving retrieval effectiveness gives rise to a couple of more specific index- 
ing strategies which are not common in summarization. 

Picking the right words from a predefined thesaurus makes things easier 
when indexing routine documents that do not propose new concepts. Innovative 
documents require the indexer to project new concepts to the old ones. The 
projection risks being difficult intellectual work. All in all, struggling with re- 
strictions of expression can turn indexing into a demanding type of summariza- 
tion. 

While most special traits of indexing reside in the presentational subtasks, 
they obviously have effects on document exploration behavior. Acquiring 
enough document knowledge to state a dozen main concepts is often faster 
than exploring a document for textual summarization, simply because only a 
limited number of individual concepts must be understood. Their relations can- 
not be expressed in the index, so why look for them in the source document? A 
sufficient choice of concepts is often found in the title and the headings, so 
why dive laboriously into the body text? Descriptors express isolated concepts, 
so why not restrict reading to words of this type, for instance by exploiting the 
back-of-the-book index? 

Most of the time indexers reach their aim faster than summarizers doing ab- 
stracting, i.e., with less intellectual work. How much work they invest depends 
on the number of descriptors, varying between something like 2 descriptors 
presenting a minimal indexation and some 20 corresponding to a deep one. 
More often than for abstracting, summarizers doing indexing rely upon the 
visible outline of the document, especially upon the title. Indexers also fall 
back on typical abstracting strategies. Nevertheless it would be rash to charac- 
terize indexing as a more superficial style of summarizing than abstracting. 
Summarizers may explore their source document more intensively when in- 
dexing than they do for summarization purposes. It is true, however, that prac- 
tical indexers cannot invest more than about a quarter of an hour per document. 
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Classifying. The experts whose professional convictions are presented here 
have considered classifying as a broad classification task, ending up with 
about three concepts. For this reason alone, classifying appears as the least 
important of the professional summarization tasks that have been investigated. 
During the working processes, classifying appears as a short prologue or epi- 
logue of indexing. Seen from a functional point of view, the classification 
strategies are almost always doubles of the corresponding indexing strategies. 
This means that indexing and classifying expertise merges. A difference be- 
tween the two tasks can only be derived from the different target vocabularies. 
In the intellectual toolbox, classifying is represented with 18 strategies. We 
may add to its account the 7 indexing strategies which are also used in 
classifying. 

Bibliographic description. Since experts often integrate bibliographic descrip- 
tion into their summarizing tasks, we follow their practice. For the summariz- 
ing tasks, bibliographic description is not completely insignificant because it 
ensures that core formal document information, especially the document title, 
is perceived right at the beginning of the working process. Moreover, the de- 
scriptive cataloging task confronts the expert with an input sheet often requir- 
ing her or him to tick several categories, such as the document type and the 
style of treatment. These categories ensure that the central professional sche- 
mata and relevant fact knowledge are activated. For instance, if the document 
is a review article, the summarizer reactivates knowledge about review arti- 
cles while ticking the respective category. If the title announces the topic The 
new family man, the expert activates knowledge about father roles while en- 
tering the title. Incidentally, the experts put a limited mental effort into biblio- 
graphic description. In the intellectual toolbox we find only 13 cataloging 
strategies. 

Global features of process organization. Professional summarizing must 
have a process architecture. Otherwise, that is if experts were unable to organ- 
ize their working processes, they could not work successfully. The following 
features of process architecture explain to a large extent why summarizing in a 
professional routine environment is feasible, and they show how to modularize 
the model. 

Basic units 

1. Experts work step by step. Their working processes are subdivided into 
working steps (phases of cognitive activity delimited by boundary markers). 
The internal structure of steps is discussed below. 

2. Strategies are frequently occurring routine thinking tools for special tasks. 
They are the vehicles of cognitive action to be observed in working steps. 
They are described below and listed in the intellectual toolbox. 
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3. Every individual working step has an internal structure. It is executed by 
cooperating strategies. The leading strategy links the working step to the 
overall working plan or to the metacognitive steering component. 

Process organization 

4. Experts follow a global experiential working plan. The working plan is not 
rigid but adapts to current demands. The matrix working plans for all fre- 
quent and relevant cases are stored in memory and reactivated when 
needed. Having these pre-established plans guarantees the routine perform- 
ance of an expert. A more detailed discussion follows below. 

5. Professional summarizing processes have a consistent overall strategy. 
From the original, knowledge items are extracted. They are either discarded 
along the way or transported into the target representation, possibly using 
intermediate representations. As they are transported, they may be trans- 
formed. 

6. Any element of processing goes through the basic cycle of textual knowl- 
edge processing: it is taken up, possibly transformed and stored again. At 
the end of the cycle the result may be checked and revised if needed. 

7. The expert has a basic grid of information processing, beyond which (s)he 
designs individual working processes flexibly according to content aims and 
other parameters. As stated in observation no. 5, all information items are 
treated consistently. The sequence in which this happens is free. It may 
follow well-established plans (see observation no. 4), but any other organi- 
zation criteria, personal playfulness included, may influence the process as 
well. 

Knowledge resources 

8. Metacognitive activity is invested in the form of volitional and control ef- 
forts. General literacy techniques (reading, writing, and thinking) are in- 
dispensible preconditions of expertise. 

9. There is an inventory of summarizing strategies which we call the intellec- 
tual toolbox. It describes how a professional summarizer’s expertise is 
composed and embeds professional summarization in more general intellec- 
tual skills. 

10. Task-oriented memory schemata structure the data being processed. On the 
one hand, they impose meaningful views upon the objects of processing. On 
the other hand, they constitute the working areas where the mental models 
of objects are gradually built up in the course of the working process. 

Strategies. By strategy we understand a specific cognitive process, a unit of 

knowledge about methods, or an intellectual tool. Like material tools for me- 
chanical work such as hammers, forks, combs, screwdrivers, or scissors, we 
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have intellectual tools for intellectual operations. Of course, they can work 
with representations instead of materials. Just as we use a comb to style our 
hair or a fork to select a piece of meat, we use a formulation tool to style our 
sentences or an information-seeking tool to select useful items from the incom- 
ing information flow. We expect material tools to fulfil their intended purpose. 
A hammer incorporates a proven strategy, for example, to drive a nail into a 
wall. If the wall is too solid, the intended strategy of the hammer fails. In this 
case we turn to an electric drill, which implies a more elaborate approach 
because it drills out a hole in which to fit the nail or the screw, instead of 
trying to drive in the nail with brute force. In the case of material tools, the 
function determines the form. With intellectual tools, a detailed definition of 
the typical procedure (a program) in our brain fulfills the same function as the 
strategic form of material tools. Like a comb has its teeth to separate different 
fibers, for example, the no-void strategy needs knowledge of methods to comb 
out empty phrases from the input text. 

The intellectual tool metaphor defines professional summarizing strategies 
by means of their intended function. 

In their empirical form, strategies have a name, a short definition, and a set 
of natural situations where they occur. For instance, the strategy hold is listed 
in the intellectual toolbox with its name and definition as follows: 

hold: Hold a meaning unit in store. (Andreas, Edward, Hanne, Harold, Inge, 

Marliese) 

It is used by all experts of the group. It can be observed in operation in the 
working steps Mackin-1, Mackin-4, and Mackin-5, to mention a tiny sample 
of its occurrences. 

The empirical description of strategies prepares their realization as imple- 
mented agents: then, the tools are ascribed activity and intention ality of their 
own. The strategy definition is compatible with established ideas of how to de- 
fine cognitive processes (e.g., WIN083) or cognitive agents: a strategy must 
possess some program that decides about its actions, a facility for communi- 
cation with other strategies, some private knowledge, and possibly an internal 
working memory, and it must be able to take input and to produce output, de- 
riving its own task-oriented view of data (for more see the introductory hyper- 
text of the SimSum system). 

The intellectual toolbox. The repertory of summarization strategies is called 
the summarizers’ intellectual toolbox. It stores and organizes the 552 strategies 
observed during the 54 working processes of the investigation. It structures and 
describes the professional summarizers’ expertise. The toolbox is appended to 
this chapter. 

The toolbox is arranged according to the function of strategies. For pragmatic 
reasons its classification is monohierarchical. It reflects standard assumptions 
about the functions of the strategies without precluding additional applications, 
just as we would normally place and expect to find a pair of scissors in a 
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sewing box, without excluding the possibility of using them as a paperweight 
or hammer substitute. 

The structure of working steps. Figure 4.14 illustrates how summarizing 
steps are realized by cooperating strategies. A leading strategy links the cur- 
rent step to the working plan. It ensures that the step contributes to the overall 
goal, regardless of any other activities that may occur as well. When a working 
step serves social functions or is dedicated to planning activities, it is directly 
subsumed by the metacognitive self-steering component (compare the planning 
step Hearn-6 (Fig. 4.44.) in the Hearn sequence below). 




Fig. 4.14. Basic structure of a working step 



Task-oriented memory schemata. A model of a cognitive activity must ex- 
plain how the mind deals with factual data. The general explanation says that 
humans use schemata (THOR79, RUME84, BREW84). The role of schemata 
in memory organization, understanding and summarizing was discussed in 
Sects. 2.4.3 and 3.3.3, while Sect. 2.3.4 related them to mental models. Sche- 
mata represent objects (referents). They are reworked through thinking pro- 
cesses. Information which can be integrated into existing schemata is easier to 
process. Scheme knowledge can even compensate for missing concrete infor- 
mation because often we know anyway (by default) what to put in. It is normal 
for cognitive schemata to have a structure of predefined categories or slots. 
They state what information is important in the context of the scheme, and 
give it a role. Schemata are activated as long as they are needed. Since 
schemata can be embedded into larger schemata, they can grow into a com- 
plex knowledge structure. The more a schema is used and the more frequently 
the same or similar data is processed, the more profoundly the schema is en- 
coded in memory. The better a person is acquainted with a domain, the more 
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developed, flexible, and filled with knowledge her or his respective schemata 
are. Routine competence is based on tried and tested schemata. 

Task-oriented memory schemata store knowledge and they function as pre- 
structured windows through which the intellectual processes accept data from 
an external input or from memory and return it after processing, either to mem- 
ory or to an external representation. They shape the working process by shaping 
its data. For professional summarizers, familiar object types and their represen- 
tation schemata are the safe haven during the working process. Various docu- 
ment types such as system descriptions, news articles, essays, empirical pa- 
pers, as well as the thesaurus, the classification scheme, a data entry scheme, 
the indexation, and the abstract, all structure the professional view of the 
domain. By focusing on these central objects, the expert excludes most avail- 
able knowledge from his or her attention. For instance, by providing input 
sheets on hard copy or computer display, the working environment helps the 
summarizer to concentrate on the right information types and schemata. 

When the summarizer starts working, the task-related cognitive schemata 
are prestructured and empty, just like an empty sewing box or bookshelf. Be- 
cause of its extension, the document representation must be the most articu- 
lated schema of professional summarization. Through the whole working pro- 
cess, the representation of document knowledge is reworked and kept active; 
only afterwards is it by and large forgotten. The summary representation is ac- 
tive in memory as long as the summary is under construction. 

In the case of professional summarizing, we need three standard memory ar- 
eas for document knowledge (see the exhibit of any concrete working step): 

• the document surface representation 

• the document scheme representation 

• the document theme representation. 

The three document-oriented areas combine to set up a simple mental model 
of the document. They are complemented by an integrated summary represen- 
tation. For the sake of clarity, the three respresentations are separated, al- 
though the cognitively plausible assumption is that they are views which are 
dynamically derived from an integrated representation. 

Document surface. The document surface area keeps the wording and outer ap- 
pearance of the document, as far as it has been perceived. It records the sur- 
face layout of the document and its units, such as paragraphs or tables. How 
detailed the representation is depends on the thoroughness of perception. Re- 
presentation gaps are normal. 

Document scheme. Professional summarizers approach a new journal article or 
any other document type with some general expectations: an article will be of 
a certain type, it has a title, it indicates the author(s), and it somehow has an 
outline including an introduction and a conclusion. The expert’s prior knowl- 
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edge about document types and their information structure is stated by the re- 
spective document schemata. Thus the expert obtains valid expectations, 
which allow her or him to interpret incoming data according to the right ques- 
tions, to foreground important data, background other elements, and reject 
anything that does not fit into the scheme. The applicable document scheme is 
activated at the beginning of the working process when the expert retrieves her 
or his professional knowledge about document representation from memory. 
Now the knowledge structure is prepared to accept a document outline and all 
knowledge directly attached to it as soon as the current document provides 
more detailed cues. 

Document theme. The document theme area is equipped with prior knowledge 
about the semantic structure of objects. Conceptual relations such as restate- 
ment or elaboration provide the structural canvas of the theme representation. 
In practical work, we use semantic relations defined in RST (Rhetorical 
Structure Theory - MANN88). The prestructuring of the theme representation 
constrains incoming knowledge to core object representations or macrostruc- 
tures, starting in practice almost always with the concepts given in the docu- 
ment title. Semantic relations link knowledge items from the incoming docu- 
ment with the thematic core. 

Summary representation. The summary needs a memory area of its own. In 
principle, it is structured like input. There must be a thematic structure (the 
text plan), a linear organization corresponding to the outline and a linguistic 
and presentational surface. In addition, an output representation must contain a 
working store for stages of text in progress: unformulated ideas, text under con- 
struction, planning data of a sentence, or a revised version of a text passage. 

The basic grid of working processes. Expert summarization processes are 
organized and flexible. This is made possible by a basic grid which serves as a 
platform for constructing concrete working processes. The basic grid can be 
characterized in terms of the overall knowledge processing strategy and from 
the viewpoint of individual meaning items. 

The overall knowledge-processing strategy. The aim of professional summariz- 
ing is to derive a short representation of a document from a long one by choos- 
ing an appropriate part of its meaning and transforming it into a target dis- 
course. Practically speaking, the summarizer extracts knowledge items from 
the original document selectively. (S)he either discards this knowledge along 
the way or puts it into the target representation, possibly going through a num- 
ber of intermediate processing steps. The steps differ in character, including 
subtasks such as document exploration and summary production. Since the 
working process is far from easy, it is often performed incrementally, i.e., by 
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repeated approximation steps. For the same reason, a check and a revision of 
the result is often the last activity. 

The basic cycle applied to objects. Every meaning item processed during sum- 
marizing goes one or more times through the basic operation cycle. It is taken 
up, transformed, and stored again. Standard transformations are operations such 
as selection, copying, assessment, reformulation, and semantic reorganization. 
At the end of the cycle, errors are repaired as needed. Then the next item is 
processed. 

The design of real-world summarizing processes. Given meaning items and 
cognitive processes as basic building blocks, there are two ways to aggregate 
larger units: by subtasks and by meaning objects (see Figs. 4.15 and 4.16). In 
the first case, all items go through a specific subtask before the next subtask is 
begun. The summarizer submits items to a specific treatment such as reading 
or writing and stores them away as intermediate products, thus aggregating 
stages or phases of the working process. In the second case, every knowledge 
item is fully processed through all subtasks before the next one is picked up 
from the source document. 

While both a clear-cut object-oriented and a clear-cut subtask-oriented proc- 
ess organization occur, summarizers often also blend the approaches depend- 
ing on their will, the difficulty of the document or other parameters. According 
to the summarizer’ s priviledged goal, we may characterize the organization by 
subtasks as phase-oriented and the organization by units as production-oriented. 
Incidentally, the basic organization forms are the same for textual summariz- 
ing and for indexing and classifying. Summarization even shares them with 
expository writing in general. 

Phase-oriented summarizing. The more frequent organization of the summari- 
zation process goes by subtasks, having one subtask after the other deal with 
all available objects (see Fig. 4.15). 

First, the summarizer explores the document, storing all acquired infor- 
mation items in memory and possibly on an external medium by marking them 
and taking notes. After exploration, all results on store are assessed for their 
usefulness. At the end of this phase, the relevant items have been singled out. 
They are the next intermediate product of the summarization process, the part 
of the document representation that is kept for the summary. Starting from this 
summary representation, the abstract or indexation is produced in the next 
working phase. At the end, the draft may be revised. Revision, i.e., improve- 
ment an already existing text or representation, happens during all sorts of in- 
formation production. When a summarizer revises, (s)he (re)applies principles 
of summarizing to improve the product, as a novel writer might reshape 
characters, improve suspense, and so on. Revision can make up for omissions, 
but it can also profit from more information, especially when, towards the end 
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of a production process, a passage can be criticized in its (almost) finished 
context. 

Figure 4.15 illustrates a working process that goes through one subtask after 
another. First, we have the exploration activities 1, 2, 5, 6, 7, 9, and 10. They 
pick up information units from the source document and deposit them in the 
surface representation scheme. Assessment subtasks (11-14) access these and 
copy relevant items to the permanent store. After that, some relevant items are 
noted or marked (steps 21 and 22). In the next working phase, the summary in- 
formation is put together. Subtasks 31-33 assemble the material both from 
notes and from the memory representation. Finally, the steps 41^3 produce 
the written summary. 




Subtasks 

- - - - ► reading for understanding -- -- integration into summary plan 

^ relevance assessment and encoding ► summary production 

— - — ► note taking or marking 



Fig. 4.15. Subtask- or phase-oriented organization 



In certain respects, phase-oriented processing takes the easy way because it 
separates one type of process such as reading or thinking about the importance 
of items from another. As the information items are accessed several times and 
in context (first read and understood, then assessed for their relevance, later 
reintegrated into the target text plan, etc.), the summarizer is more certain to 
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bring her or his work to a good conclusion, to learn enough about the 
document, and to notice errors. At the moment of relevance decision and writ- 
ing, many knowledge items from the source document are known and can be 
considered together. This is particularly helpful in less transparent domains 
(e.g., the social sciences) where documents may have surprises in store. As a 
drawback, a process organization according to subtasks engenders heavy 
memory storage and access demands (see Fig. 4.15). Especially at the moment 
of assessing relevance after the exploration phase, summarizers must re- 
member or look up a considerable part of the document. Later, the amounts of 
information under active processing shrink as inessential passages are left be- 
hind. 




Subtasks 

► reading for understanding integration into summary plan 

► relevance assessment and encoding - - - ► summary production 

— ^ note taking or marking 

Fig. 4.16. Production-oriented summarization 



Production-oriented summarizing. During production-oriented summarizing, the 
summarizer begins on the spot to produce the target summary. Unless they are 
dropped, knowledge items are handed through all applicable subtasks. They are 
first acquired by document exploration, then assessed for their relevance, inte- 
grated in a target statement and lastly written onto an external medium such as 
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a sheet of paper. Figure 4.16 shows a segment of a production-oriented 
summarizing process. Here, a move is responsible for an output item, inte- 
grating all necessary subtasks. It may achieve its aim or fail. In the figure, the 
moves 1 and 2 reach their aim using all intermediate representations. Move 3 
skips the external note or marking. For lack of relevant input, moves 4, 5, and 
6 do not cross the surface representation and abort, whereas move 9 is suc- 
cessful again. 

The production-oriented working style has the advantage of dealing with one 
information item at a time, hence keeping the document-related memory load 
small. A problem is that the summarizer has to decide about the use of an in- 
formation item more or less at the reading time, often in isolation without the 
larger context available after a more complete exploration of the document. 
However, summarizers who are acquainted with their domain and its document 
types can compensate for missing information by creating the context for inter- 
pretation from their own knowledge. In the case of error, this is more risky in 
abstracting than in indexing because a textual summary makes shortcomings 
more visible than an indexation. Perhaps for this reason, the approach is less 
frequent when writing an abstract, whereas it is current in indexing processes. 
The Trueby abstracting process (below) is production-oriented. 

Processes with intensive and extensive information use. Irrespective of the 
global organization, working processes differ in input-output economy. In rela- 
tion to the output they are to produce, experts may read much or just enough to 
provide the output material. When the acquired information is utilized 
intensively in the target representation, the process is economical, i.e., there is 
little informational waste. Almost all acquired information items appear in the 
target representation. In working processes with extensive information 
acquisition, on the other hand, the yield is lower and much more information 
material is discarded during processing. 

Summarizers manage their information use depending on the situation. How 
much information acquisition is needed is decided by not only document fea- 
tures but also by characteristics of the domain and of the expert her- or himself. 
A working plan with intensive information use is possible if the document 
structure paves the way to in-text summaries. In this case, the expert is guided 
by information needs. (S)he asks if (s)he knows enough for the summary, and 
if so, (s)he no longer explores but writes. Expert competence in the field al- 
lows this economical approach to summarizing without risking serious errors. 
The expert achieves efficiency by focusing resolutely on those parts of the 
original which contain in-text summaries: title, blurb, table of contents, and 
preface. 

A process plan with extensive information acquisition goes the safest way to 
summary production. Its drawback is that it is work-intensive. The summarizer 
invests much effort in information seeking. Only a small part of the acquired 
information is finally put to use, but the summarizer may have learned a lot. 
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However, the labor-intensive way to a summary is often imposed by problems 
of the document: 

• If the original document is poorly structured and thus the preconditions for 
more sophisticated techniques are missing, if in-text summaries of the 
needed size cannot be found, then the expert has no other choice than to 
search through long text passages or the quality will suffer. 

• In fast-moving disciplines, such as computer science, or in broad fields 
with an intricate internal structure, such as the social sciences, a greater 
learning effort is indispensable, because it is harder to estimate the impact 
of an individual document. The consequence is either more work or lower 
quality, even for a domain expert. 

Working plans and subplans. Since experts are experienced, they must have 
a repertoire of ready-made working plans. The availability of these working 
plans ensures confident routine action in their domain. Like cooking recipes, 
plans state which materials and actions are needed in which sequence to 
obtain a result, be it a tomato soup or a classification code (HAMM90). 
Macroplans ruling actions of larger size such as preparing a meal typically 
include subplans such as cooking a soup. Plans may be configured anew at 
every application. For instance, the type of soup can depend on the contents of 
the freezer, where we may happen to find the ingredients for a vegetable soup, 
but nothing to make up an oxtail soup. Within the tomato soup recipe, we can 
exchange partial plans. In the season, fresh tomatoes may be preferred, 
whereas canned ones must do during the rest of the year. Some cooking recipes 
are difficult. We look them up when applying them. Others jump readily from 
our memory. The more professional we are as cooks, the more we know recipes 
and their variations by heart. 

The same is true of plans of professional summarizers. They are defined at 
different aggregation levels, stating for instance the procedure for drawing a 
descriptor from the document title (an operation of moderate scale) or the 
strategy for abstracting a review article (a large-scale plan). Summarization 
experts need global overall plans and a generous number of plans for subtasks. 
The overall impression is that they design their summarizing processes using 
adaptable plans, obeying the basic grid of knowledge processing without being 
trapped in it. The empirically observed subplans cover many well-known 
topics. Among them: 

• information acquisition for an indexing term 

• identifying the document topic 

• recognizing an in-text summary 

• writing a topic sentence 

• solving an understanding problem by comparing paraphrases of the difficult 
passage 

• obtaining abstract statements from headings 
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• exploiting a back-of-the-book index for indexing 

• checking a synonym reference 

• complementing the indexation with thesaurus concepts 

• revising verb tenses 

• including a superordinate concept in an indexation 

In the working sequences presented below, some of these problem-oriented 
subplans can be studied during execution. 

Professional summarization competence and general personal skills. Fig- 
ure 4.7 reminds us of the general personal competence (listed on the left) that 
is the indispensible foundation of professional summarizing expertise (listed on 
the right of the drawing). 

The expertise of professional summarizing is by necessity embedded in gen- 
eral intellectual competences: 

• the literacy techniques that an adequate scientific or technical education 
develops in a person 

• the metacognitive abilities that allow a person to monitor, assess, and 
guide his or her behavior 

• the skill for controlling and steering operational processes 



general competence 



Summarization expertise 



metacognitive steering, 
personal self-control 



operational control 



general literacy: 

reading, writing, and thinking 



task-related skills 



information acquisition 
and relevance assessment 



information presentation: 

• abstracting, 

• indexing / classifying 



revision, cataloging 



Fig. 4.17. Summarization expertise and its intellectual underpinnings 



Summarizing shares these intellectual preconditions of expertise with other in- 
tellectually demanding activities, e.g., managing a company or designing 
electronic circuits. Anyone who ignores writing, calculating, abstract thinking, 
and other cultural techniques is unable to work successfully in any of these 
specializations. The same is true for a person who cannot observe and influ- 
ence her or his own attitudes and actions, for instance by measuring them 
against general principles such as sincerity or efficiency, or who is over- 
whelmed by any task that implies coordinating a sequence of possibly different 
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operations such as picking up a screw, putting it into the right hole, taking the 
screwdriver, and tightening the screw. 



4.3.4 Central summarization subtasks: Exploration, 
relevance assessment, and summary production 

In order to put the most important information into a summary, we have to ac- 
quire information from the original document, to reduce it to what is most im- 
portant, and to put that into the target summary text (see Fig. 3.2). Every time 
they produce a summary, summarizers are confronted with these basic tasks of 
summarization. Recognizing them in real-world working steps and sequences 
in spite of some situational “noise’’ (mix with other activities, adaptation to 
local data, slicing into partial actions, etc.) is easier when their typical fea- 
tures have been studied beforehand. Hence the following section previews 
document exploration, relevance assessment with encoding of thematic knowl- 
edge, and summary production, focusing on characteristic points. In other 
words, the section prepares the study of empirical evidence from real-world 
situations. Subsequently, readers will find it easier to see the common princi- 
ples behind the particulars presented by concrete working steps. 



4.3.4.1 Document exploration 

Developed techniques of document use are an essential asset of a summarizer 
because extracting the relevant information successfully from the source 
document is the prerequisite of any worthwhile summary. For an unbiased per- 
son, it is most convenient to decide what is relevant for the summary as soon 
as complete information about a subject is available. Consequently the person 
(or a summarizer) first “reads” (decodes and interprets) the document and then 
configures a summary about it, i.e., a complete reading is followed by cogni- 
tive activities which reduce the incoming information. 

Even in everyday life, this method may prove to be too cumbersome. Read- 
ing long documents in order to produce a short target information product such 
as a summary is much too inefficient in professional environments. Instead of 
reading them, professional readers search documents for useful information. 
They know how to apply “dynamic” or “strategic” reading (PARI83, 
PUGH78). Strategies of dynamic reading or document exploration steer basic 
reading strategies such that they acquire the desired information and neglect 
the remainder of the input. The task, features of the information source and the 
situation of the person at work determine most strongly what information is 
acquired. 

All this is also true for professional summarizers. They need the information 
for a short summary. It must be correct. Nevertheless there is no point in read- 




4.3 An empirical cognitive model 147 



ing more from the source document than required for writing a correct sum- 
mary. A competent summarizer can save much time by restricting reading, 
concentrating on passages which yield good information and passing over other 
passages with very limited intensity, or not reading them at all. 

Exploration includes intentional steering. It is easy to imagine how the in- 
tention of the reader influences the process of document exploitation. Let us 
look at three reading situations: 

• During the preparation of an exam, a student consults a botany textbook to 
find out how photosynthesis works. A normal botany textbook treats many 
subjects, among them the student can hope to find photosynthesis. (S)he 
refers to the table of contents to find a promising chapter and to the back- 
of-the-book index to look up where photosynthesis is treated. After that, 
only the relevant part of the book is consulted. 

• In order to win a bet, somebody uses an encyclopedia to look up the precise 
date when the Pilgrim Fathers left Plymouth harbor. To look up a date in an 
encylopedia, we must first of all know that an encyclopedia is organized 
alphabetically by the concepts that are explained. From the alphabetical 
order our subject finds the entry of interest. If there is no entry for the pre- 
cise word (s)he had in mind, it helps to know some related concepts where 
the desired information can be found as well, or a reference to it. Only af- 
ter the arrival at a promising headword (‘Tilgrim Fathers'’) does normal 
reading activity take place. Less patient people are more likely to search 
the entry for the desired date. The output of the search is a date. 

• To justify an investment decision, a manager exploits the annual reports of 
the competitors, studies market analyses with their statistics, and keeps an 
eye on the pertinent pages of journals such as the Wall Street Journal or the 
Financial Times. When preparing an investment decision there is no time 
to waste, therefore the decision-maker tries to find out relevant information 
and to keep reading to the minimum. All information sources, be they sta- 
tistics or newspaper articles, must be interpreted with attention to their 
specific information structure and to the current need of information: we 
may draw pertinent figures from the market survey, whereas the newspaper 
might offer the latest opinion of a stock exchange analyst. Both pieces of 
information are useful, but in different ways. As a result, the manager pro- 
duces an argument. Its ingredients are fruits of information acquisition from 
various sources, albeit only those which foster the manager’s investment 
decision. Basic reading fades into the background. 

A person consulting an encyclopedia navigates in a certain sequence through 
the information source, finding first the cueword in the alphabetical array and 
then perhaps scanning the related article. The student exploiting a textbook 
may start a structure-driven search in the table of contents, the eyes hopping 
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around following its formal organization, and then read very intensively the 
chapter about photosynthesis, the eyes moving slowly forward, going to and fro. 
The information acquisition intentions may vary from one moment to the other, 
for instance when a reader is seeking in sequence a good definition of a 
concept, looking up a figure, reading the passage about the market situation in 
India, etc. At each of these moments, the visual apparatus does a specialized 
job, acquiring information on order, using appropriate techniques. We assume 
that simply by putting the intentions together, the cognitive apparatus adapts. 




explore 




Fig. 4.18. Document exploration with intentions, exemplified by step Goonatilake- 
65 (right) 



From these observations follows that intentions influence how exploration pro- 
cesses are executed, down to steering the movement of the eyes. This is as 
true for professional summarizers as for other dynamic readers. Therefore pro- 
fessional summarizers must have a level of process steering where the current 
acquisition intentions are expressed. The resulting structure of exploration pro- 
cesses is shown in Fig. 4.18. It is accompanied on the right by a real-world 
specimen that demonstrates how the structure is filled out by observed strate- 
gies. The figure shows that during exploration, the current information recep- 
tion intentions are formulated by the cooperating strategies unit, first, search, 
by-form, heading, and retrieve, and that external input is provided by a team of 
three optical reading strategies: read-form, read-find, and read-free. The re- 
ception intentions are given in content terms: the input item should be at the 
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beginning of a unit, it should be a heading, it should correspond to an outline 
item, etc. The reading strategies have different functions such as copying char- 
acters and layout features, recognizing a known target, or moving around in 
direct access mode. 

When going through the empirical exploration processes discussed below, 
readers will encounter the scheme given in Fig. 4.18 behind different configu- 
rations of exploration strategies. 

The role of document types and their organization. Anyone who searches a 
document instead of reading it ends up with partial document knowledge only, 
leaving a possibly large terra incognita. The problem is how to find the inter- 
esting information in the unknown document. Professional summarizers can 
rely on their knowledge about document types. They have seen so many re- 
ports, journal articles, monographs, theses, review articles, surveys, etc., that 
they know their standard organization and other interesting features. Thus no 
document is totally new to them, and they know how to find what they are 
looking for. 

Because of its limited information size and its discernibility by layout, the 
outline of a text is immediately accessible, whereas the bulk of the text is not. 
Abstractors know that in a well-organized document, the outline is a structured 
representation of the theme, equivalent to a summary. Therefore, they use the 
outline items as search cues and try to expand them with meaning substance 
from text passages. They also know that different parts of documents differ in 
function. For instance, an introduction is unlikely to present the results of a 
study, whereas the results section will almost certainly do so. Unless the author 
has made a bad mistake, however, the introduction describes what the whole 
article is about. An abstractor going through an introduction watches out for 
such topic statements or summaries, in the same way that (s)he does for the 
methods used when analyzing the methods section. 

In addition, the abstractor knows that longer documents like reports, scien- 
tific articles, or monographs have an information organization which includes 
two (or more) levels: a text body which conveys the basic information content 
and on top of it, an access structure. The top level of document organization is 
most distinct in large texts like monographs, but it exists in journal articles as 
well. It is made up of text components that contain global information about 
the text: table of contents, preface, introduction, conclusion, blurb, index, list 
of references, to name the most common items. Individual chapters of books 
often repeat this two-layer organization. They contain global top-level informa- 
tion in their introduction and conclusion, possibly a table of contents, an index 
and a reference list of their own, and basic information in the text body. All in- 
formation for a summary may be, should be present in top-level document 
components. So why read the body text of a document at all? 
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Assembling the material for a summary by searching. Summarizing can be 
realized by reading and understanding a document and redueing it to its most 
relevant items. This work-intensive procedure is not compatible with the mere 
exploitation of the source information for a particular task, e.g., for the pro- 
duction of a summary. An alternative to summarizing by understanding and 
reducing information is to assemble the meaning representation for a summary 
by a top-down document exploration. Top-down exploration is supported by the 
outline without being strictly bound to it. The title provides an initial for- 
mulation of the document theme. The summarizer tries to enrich this theme 
information from the text by (implicitly or explicitly) asking questions such as: 
What are the aims? How exactly are they achieved? What are the methods? 
Why? How many test subjects? What results? What impact? and noting the 
answers. Readers may compare this professional questioning strategy to the 
well-known Lasswell formula for news from mass communication. Using this 
exploration technique, the summarizer expands her or his theme knowledge 
and thus obtains the content for the summary. As soon as all essential 
questions about a document type have been answered, the information base of 
the summary is complete. The textual summary can be produced from it. 



4.3.4.2 Assessing relevance and recognizing the thematic structure 

Recognizing the theme or the thematic structure of a document, or solving the 
aboutness problem, is one of the classical main tasks of a professional sum- 
marizer. Sometimes the theme is seen as a simple document feature, such as 
being about mice. In a more comprehensive view a theme is an organized se- 
mantic structure that pervades the text and makes it coherent, i.e., the well- 
known macrostructure (see Sect. 3.4). Normally, summarizers need several 
steps to establish its representation. They may restrict their interest to the 
thematic core, reconstructing only the part of the macrostructure which fits into 
the reduced size of a summary. 

The reconstruction of the thematic structure is embedded in relevance 
judgements. This combination may seem astonishing at first glance, but it fol- 
lows from the characteristics of the task, which is to put the most important 
items from the source document into the summary. One prominent factor in 
importance or relevance is belonging to the core thematic structure, to the 
main statements of the document. The closer an item is to the semantic core, 
the higher its importance in the document, and the more stringent the need to 
include it in the summary. In contrast, information items without any ties to the 
thematic kernel of the document are seldom relevant. No item is integrated in 
the core thematic structure of a document without being assessed as relevant. 
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The theme - a structured representation of document meaning. As stated 
above, people use different notions of the theme. It characterizes the content of 
a text, e.g., of a school composition. A theme of this sort may belong to an ex- 
isting document or to a document which is still to be produced. The important 
point is that the nuclear theme of the beginnings, the “root” of the thematic 
structure, will grow as it is enriched with meaning components. At the end, the 
theme is a structured representation of a document’s meaning. The additional 
meaning components are linked by semantic relations to the root of the the- 
matic representation. 

Figure 4.19 shows a structured theme representation. We find a root proposi- 
tion big difficulty: communicating effectively between cultures which is accom- 
panied by a paraphrase. Both are related by a restatement relation. An example 
component is linked to the root by an exemplified-by relation, and a compo- 
nent describing the observable results of the problem {wrong content, wrong 
structure, wrong presentation) is attached by a cause/effect relation. 




Fig. 4.19. The thematic structure of the Mackin paper 



Considering the theme as a structured representation of discourse meaning - a 
macrostructure - helps us to understand how a summarizer can identify the 
important core statements in the source document. The summarizer recon- 
structs the thematic representation of the source document in his or her mind. 
(S)he finds in the document candidate thematic statements, checks them for 
relevance according to several criteria including the closeness to the theme, 
and attaches the successful candidates with one or more semantic relation(s) 
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to the thematic structure. Thus the thematic structure is expanded until it can 
make up the content of the summary. 

Finding candidate thematic statements in the source document. From a 
practical viewpoint, one way of recognizing the thematic structure of a docu- 
ment exploits the author’s topical expressions (phrases, sentences, summa- 
ries), interpreting them and putting them into the thematic representation. 
From there, the summary will be taken. 

An author has a vested interest in explaining to her or his readers what (s)he 
is talking about, what is the core of the meaning complex that (s)he wants to 
submit to her audience, which are the most important findings etc. (S)he states 
it in the text in such a way that this is pointed out to every reader. The needed 
emphasis may be achieved by different means: 

• a priviledged position in the title, in a heading, at the beginning of a text 
component etc. 

• special layout features (bold faces, color, italics, ...) 

• rhetorical emphasis (“first and foremost, I want to show...”) 

• repetition, rephrasing 

Most documents are redundant. They offer a choice of author’s formulations of 
kernel statements. They can all be exploited to improve and expand the sum- 
marizer’s theme representation. By comparing several formulations, taken pos- 
sibly from distant parts of the document (e.g., one from the preface, one from 
the conclusion, or one from the introduction, one from the table of contents), 
the summarizer can check her or his understanding of the theme and achieve 
informational safety by interpreting little, but well-chosen material. 

Expansion of the thematic structure with relevant statements. Every infor- 
mation item from the document which has been recommended as a possible 
theme statement may be evaluated by additional strategies. They may shed 
light on its information value, its theoretical stams, and so on. Then relevance 
by closeness to the theme (centrality) is established by trying to find a 
semantic relation that attaches the item to the thematic structure. 

For instance, in Fig. 4.19, wrong content, wrong structure, wrong presenta- 
tion is part of the theme because it links up to the item big difficulty: communi- 
cating effectively between cultures with a well-known relation, namely the 
causal relation. A causal relation can be established by looking up in a knowl- 
edge base that the relation exists. 

Figure 4.20 deals with restatement relations. Restatement relations are par- 
ticularly interesting because, from a more technical perspective, an in-text 
topic sentence or summary is a restatement or an elaboration of the theme 
statement, i.e., its valid but possibly enlarged representation. We can test a 
candidate topic sentence and a summary by comparing it to the known theme 
representation. It must contain all essential elements, thus justifying a restate- 
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ment relation. The summary must bring some additional information. It links up 
to the theme with an elaboration relation. 

Example: Installing restatement relations. A simple propositional represen- 
tation (Fig. 4.21, example from the SimSum system) helps us to understand the 
attachment by semantic relations better. In this representation format, we find 
numbered individual propositions. Formally they are built like predicates used 
in predicate calculus. They consist of a predicate and its ordered arguments. 
Every argument position has a specific semantic role as defined in the 
knowledge base that lies behind the propositions. In addition, the knowledge 
base contains records for the individual concepts. The records are not presented 
here in detail, but sometimes we have to talk about them because the 
strategies consult them. Numbers inside a proposition represent an embedded 
proposition. For example, proposition 4 is embedded in proposition 5. We find 
two types of propositions: the domain propositions that state facts about the 
domain of discourse - here about communication problems - and the interac- 
tion propositions that talk about the communication between author and reader. 
Commas separate different positions of arguments. Some positions may remain 
empty. 

Finding a restatement relation involves a knowledge base and some 
inferencing. Let us look again at a text passage from the Mackin paper: 

In this article, I use the problems involved in communicating across the barrier 
of the Japanese language and culture into English as a general example of diffi- 
culties that may be inherent in any cross-cultural translation. 

The theme (see Fig. 4.19) 

big difficulty: communicating effectively between cultures 
[... is-difficult, communciation, culturel, culture2] 

reappears twice in the above-cited text passage that the author heavily empha- 
sizes: 

problems involved in communicating across the barrier of Japanese language 
and culture into English 

[... has-problems, communication, Japanese-language, English-language...] 

difficulties that may be inherent in any cross-cultural translation 
[... is-difficult, translation, culturel, culture2] 

Figure 4.20 explains how the restatement relation can be identified. It presup- 
poses the formalized semantic representation using predicates as shown in Fig. 
4.21. We look at three propositions, the first stating a theme proposition and 
the second and third candidates for attachment: 
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Fig, 4.20. Installing restatement relations 



1 domain_communicate ( , , content/wrong/, , , fail, possibility) 

2 domain_communicate ( , , structure/wrong/, , , fail, possibility) 

3 domain_communicate ( , , representation/wrong/, , , fail, possibility) 

4 interaction_examine (author, [1, 2, 3], , , , in turn) 

5 interaction_introduce (author, articles_significant, 4) 

^^Communication can fail in three ways: wrong content, wrong 
structure, or wrong presentation. I now examine each of these 
problems in turn. For each of these problem areas, I introduce 
some significant articles...’’ 



Fig. 4.21. Simple propositional representation with corresponding surface text 



1. [... is-difficult, communciation, culturel, culture2] 

2. [... has-problems, communication, Japanese-language, English-language...] 

3. [... is-difficult, translation, culturel, culture2] 

The restatement relation between proposition 1 and 3 can be seen after only 
one inference. The inference looks into the knowledge base and finds that 
translation is a sort of communication (compare Fig. 4.20). After subsuming 
translation under communication, both expressions are equal. The restatement 
relation is justified. 
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Discovery of the restatement relation between propositions 1 and 2 requires 
more look-ups in the knowledge base: has-problems must be found as synonym 
of is-difficult, the Japanese and English language must be known as parts of 
their respective cultures, and the Japanese and English cultures must be recog- 
nized as two sample cultures among the cultures known to the knowledge base. 
After these simple inferences, which do no more than pick up concepts, look 
up the respective records in the knowledge base, follow is-a, synonym, and 
part-of links if needed, and return concepts that match the theme if they find 
any, the restatement relation between propositions 1 and 2 can also be estab- 
lished. 



4.3.4.3 Summary production by cutting and pasting. 

Their professional role tells abstractors to avoid inventing anything. They fol- 
low the author as closely as possible and reintegrate the most important points 
of a document in a shorter text. For this reason, we can roughly characterize 
their text production style as copying relevant text items from the original 
document, and reorganizing them to fit into a new structure, often with the 
help of standard sentence patterns. Seen more from inside the brain, sum- 
marizers work with the representations of document knowledge they have ac- 
quired by document exploration and interpretation. At the time of writing, the 
content-related representations - theme knowledge and document structure 
knowledge - are exploited. If the memory representation of the document sur- 
face (formulation and layout) is too poor to reconstruct the wording, the sum- 
marizer typically returns to the original and copies the interesting passage, 
adapting it to its new context. 

Producing summary statements. Two main steps are necessary in order to 
produce a summary statement: 

• the meaning items chosen from the representation must be put in a seman- 
tic format that can be expressed by an output sentence (text planning) 

• the sentence plan must be given a linguistic surface expression (formula- 
tion) 

Professional summarizers have to write the topic statements of abstracts over 
and over again: about the aim of the research, the methods, the results, their 
evaluation, their application and so on. Thus they assemble a pool of standard 
patterns that accommodate varying concept configurations taken from the cur- 
rent document representation. Every semantic configuration that respects the 
definitions can be subsumed by the format. Figure 4.22 presents such a general 
sentence format. It may read “phenomenon x in the domain y produces effect 
z” or “effect z of problem x in the context y is investigated”. This is a typical 
semantic format of a topic sentence. When following the cutting and pasting 
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technique, summarizers are assumed to choose topic sentence plans of this 
style from their pool of standard patterns and to accompany the standard se- 
mantic format with a standard formulation such as “x is examined for 




Fig. 4.22. Common standard topic sentence format 



Figure 4.23 gives an instantiated version of the topic sentence format. The 
nodes have been filled with meaning objects, taken from the thematic repre- 
sentation of the document. The relations between the meaning items have been 
refined. Their label ascribes a role to the node which they aim at. The se- 
quence of the meaning items is still open. 



problems in communicating 
across the barrier of the 
Japanese language and 
culture into English 



cause/effect 



context 

domain 




wrong content 
wrong structure 
wrong presentation 



big difficulty: communicating 
effectively between cultures 

restatement 

difficulties inherent in any 
cross-cultural translation 



Fig. 4.23. Instantiated version of the topic sentence format 



Before looking at working step Mackin-13 from where the example is drawn, 
readers are invited to compare Fig. 4.23 to the output sentence which has been 
derived from it: 
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The effects of wrong content, structure or presentation on effective com- 
munication across barriers of Japanese language and cultures into English are 
examined for their general [ ] in any cross-cultural translation. 

The semantic roles of the topic sentence pattern together with its standard for- 
mulations and the lexicalized concepts from the theme representation reap- 
pear. In the exhibit Mackin-13 the external input shows the original passages 
from the source document to which the summarizer returns for the precise 
wording for his topic sentence. Copying, i.e., a one-to-one assignment of ready- 
made formulations to concepts from the representation, accounts for almost the 
whole output sentence. Reorganization is restricted to changing word classes 
(“communicating” to “communication”, “effectively” to “effective”). 



4.3.5 Why and how natural summarizing examples are presented 

Knowledge about summarization realia. In the following, a choice of work- 
ing steps and sequences are presented. They demonstrate in detail how 
summarization strategies work. We look at summarization realia (i.e., pieces 
of evidence) in the same way as other disciplines consider the anatomy of the 
human body, excavation sites of prehistoric settlements, or preparations of 
exotic butterflies. In many disciplines the interest in the empirical reality of 
their domain is a matter of course - why not transfer this attitude to the investi- 
gation of professional summarizing? Knowing the objects we are talking about 
is always good scientific style. Those who consider computational summariz- 
ing will in addition see the practical advantages of empirical knowledge. The 
basic argument is that it is more practical to execute an object or an operation 
according to an example that works and that is understood - i.e., to simulate it 
- than to invent it anew. Reverse engineering is easier, but it requires knowl- 
edge of the given reality. 

Concrete knowledge is often bound to specific situations or contexts. This is 
evident when it comes to building a house. If all we have is a plan of a well-ar- 
ranged stock of individual components of a house and a construction specifi- 
cation, and nobody knows what a finished house looks like or has ever touched 
a brick, it will be extremely difficult to construct a house simply on the basis 
of the components and the specification. In this case, a holistic image of the 
target product, of assembled subcomponents, and recipes for subtasks of house- 
building such as setting up a roof, improve our chances of success. The same 
holds for other complicated processes, among them understanding summariza- 
tion and computational summarizing. 

In the case of a cognitive process as complicated as professional summariza- 
tion, we need holistic and context-bound knowledge from suitable observa- 
tional units. For the following reasons, working steps are promising units for 
observation: 
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• Working steps are reasonably self-contained small modules. Although 
many cognitive operations may occur during a working step, steps are 
much easier to analyze than larger units like complete working processes. 

• Working steps are accessible to direct observation, whereas other features 
of the process must often be constructed from observational material and 
background theories with the help of modeling methodology. 

• Natural working steps demonstrate intact and competent human behavior. 
An important part of human know-how lies in the skilled combination of 
different strategies. Only in working steps can we observe how successful 
summarizing acts are performed and how several pieces of know-how are 
integrated to achieve a cognitive aim. 

Sequences of working steps give small-scale information. They are too small in 
size to demonstrate wide sweeping features of process organization. Directly 
visible in a working step are its inner cooperative structure and the fringes of 
metacognitive activities which creep in. The observed working steps focus on 
foreground productive activities, while metacognitive planning and control are 
relegated into the background. 

In spite of these limitations, some characteristics of process organization can 
be observed in sequences of working steps. Some examples: 

• A sequence deals with process planning and thus reveals what the macro- 
scopic organization looks like (see Hearn sequence). 

• A sequence shows a sample of the characteristic large-scale process or- 
ganization. So does the Trueby sequence. The Trueby overview (Fig. 4.34) 
shows that indeed the online abstracting behavior in the sample pervades 
the whole summarizing process. 

• Sometimes, subplans are executed discontinuously. Discontinuous sub- 
plans can be seen in the Sperl (see Fig. 4.62) and Goonatilake sequences. 

The selected working steps and sequences. As natural objects, the selected 
working steps and sequences have many features. To give readers some orien- 
tation, Fig. 4.24 assigns the segments from working processes which are dis- 
cussed in detail to the main issues which they illustrate. For readers who prefer 
to study summarization processes from the examples simulated by the SimSum 
system, the book description provides backup material and a more extensive 
empirical discussion. 

The form of presentation. The observed working steps are presented in 
exhibits accompanied by a comment. In the comment, the first paragraph (in 
italics) summarizes briefly what happens in the step. It functions as a preview 
for readers who want to study the current step and as a local summary for 
readers who decide to skip the naturalistic detail of summarizing. Since 
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readers are not expected to go through all the empirical working steps from 
beginning to end, but to use them selectively like in an exhibition catalog, the 
depth of description varies so that everybody can quickly find a step which is 
discussed in detail that can serve as a starting point. While the main aim of 
the comment is to interpret the observed step in terms of the summarizing 
model (i.e., with the concepts of the grounded theory of professional summariz- 
ing), the interpretation may occasionally also include background information, 
for instance about the use of a move, the intentions of the summarizer, or 
about alternative interpretations. This makes the account more vivid and easier 
to follow. 



subtask illustrated-by 




working step or sequence 
Judge-3 : Let me see what the article is about 



Mackin sequence: Recognizing the documait theme and 
drafting the topic sentence 



The Trueby sequence of online abstracting 



Hearn sequence: How a document type-specific working plan 
is developed and applied 



Black sequence: ProfessicMial document use with 
dynamic reading 



Goonatilake sequence: Dynamic reading 
techniques 



Mills-15: Assigning a classification notation 



Rada sequence: Pragmatic indexing techniques 
Sperl sequence: Incremoital indexing 



Fig. 4.24. Overview of real-world expert summarization samples 



Readers will find a general introduction to all sequences of steps. In two cases 
(the Trueby and the Sperl sequence), the introduction includes a process over- 
view diagram whose form is explained below. 

The form of exhibits. All exhibits conform to the common structure presented 
in Fig. 4.25. Readers can best combine Fig. 4.25 and any data-carrying exhibit 
to see how the working steps are presented. 

In an exhibit, the state of knowledge at the beginning of the working step 
(the input into the process) is displayed in the top windows, the process in- 
formation itself appears in the middle, and the state reached at the end of the 
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Step (its output) is presented at the bottom. The display thus shows what hap- 
pens during a working step. Arrows roughly hint at the two most important read- 
ing directions. Vertically we go from the start state of the step to its end state. 
Horizontally, from left to right, we begin with shallow surface representations, 
pass through the medium-deep representation of the document scheme, and 
end up with the deepest representation, the document theme. 

Global schemata for input and output. As discussed above (Sect. 4.3.3), in- 
put and output data of a working step are organized by cognitive schemata. At 
the top of the exhibit, we normally find up to three windows representing the 
professional standard views on the document. They display the state of task-re- 
lated areas from the summarizer’s memory at the moment when (s)he starts 
the new working step: the document’s surface representation, the document 
scheme containing the outline and attached information, and the document 
theme where the topic structure of the document is built up during the working 
process. Below the top row of schemata, the window for external input supplies 
the data that will be processed during the working step, i.e., the text read. 
There, we find underlined (or in bold fonts) what the summarizer actually 
reads (compare the thinking-aloud protocol segment in the processing area). At 
the bottom of the exhibit, the task-oriented document schemata display the 
state of the summarizer’ s knowledge at the end of the working step. Sometimes 
written external output exists. It may take the form of markings added to the 
source text, of notes or of an abstract passage. What is externalized appears in 
the output window. Summary representations show up as soon as needed. 
Despite being smaller in data size, the summary area mirrors the organization 
of the larger document representation. 

It is normal to have external data mirrored by an internal representation. The 
document surface representation stores an image of the document as far as it 
has been perceived, and the summary is represented in a memory area that 
keeps track of its state and of passages that have been planned or written, or 
are under construction. We assume that in the case of external input, the in- 
ternal representation is the result of perception, whereas in the case of writing, 
the internal representation feeds the external version. 

Schemata (windows) which do not yet contain data are not represented in 
the exhibits, because they are assumed to play no role in the person’s aware- 
ness, and because we are short of space. So the reader may observe that a 
document theme area is missing in the upper part of Fig. 4.25, whereas it fig- 
ures in the output description at the bottom of the display. Only in the course of 
the working step does the summarizer develop a first guess as to its content. At 
this moment, the document theme window pops up because it becomes instan- 
tiated with the initial data. A summary area will appear as well and store the 
relevant material as soon as the first idea of the summary emerges. 
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Most of the time, a window can only show the information currently used, and 
even this must be abridged due to lack of space. The reader may also some- 
times notice gray boxes inside the windows representing cognitive schemata. 
They indicate areas of activation inside the memory representation. Knowledge 
acquired during the current working step is also thought of as activated and 
marked by a gray box. Square brackets include text passages which are known. 
To save space, the passages are characterized by their beginning and their end 
only. Since documents are more often than not read selectively, gaps in the 
surface representation are normal. 

The process description. In the middle of the exhibit, the pertinent passage 
from the thinking-aloud protocol gives the summarizer’s description of the cur- 
rent activities. Here we find spoken English discourse, including some trial and 
error in thinking and talking as is usual under thinking-aloud conditions. In ad- 
dition to the summarizer’s words, the protocol may record pauses (the more 
dots, the longer the pause) and characterize intervening noises or third parties. 
Noises are interpreted if they are clear (e.g., the noises of a zip opener, of an 
underlining, or of a phone ringing in the background), otherwise they are me- 
rely noted as noises. 

The treelike structure above the thinking-aloud protocol gives a sketch of the 
cognitive apparatus (or the cognitive program system) at work behind the 
observational data. It explains which strategies must be active and how they 
cooperate to reach the aims of the current working step. All the strategies that 
occur are defined in the intellectual toolbox. Strategies at the root of the tree 
plan, restrict, and monitor, whereas at the leaves we find strategies that deal 
with text and its meaning. Strategies put in boxes are thought to cooperate 
more closely than others. The treelike structure sketches the functional 
structure of a working step with limited precision. Among other things, it does 
not explain very much in what sequence and how frequently individual 
strategies apply. Sometimes, a segment of the tree is put into a dotted box in 
order to attract the reader’s attention to it. 

The technical form of process diagrams. In order to give readers an overall 
orientation in the working processes that encompasses more than the few 
working steps chosen for a detailed discussion, the Trueby and the Sperl se- 
quence (see below) are accompanied by process diagrams of the whole pro- 
cess. Process diagrams are structured like an abacus (compare Fig. 4.26). The 
abacus rods serve as representations through which information items are 
moved during summarization. The obvious source representation is the original 
document. From there, information is picked up and represented in memory 
during exploration and understanding. We distinguish two internal (memory) 
representations of the source document: a surface representation which records 
what has been remembered from the wording and the layout, and a deeper en- 
coded text knowledge representation. Units transferred to the document knowl- 
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edge representation have been judged relevant. Often, notes as external store 
accompany and back up the internal representation. In the next representation 
(“product knowledge”) the summarizer puts information units that (s)he 
decides to use in the summary. While this product-oriented representation is in 
the summarizer’s mind, the draft product is written on a sheet of paper. Since 
there may be any number of amendments to a draft, they are considered to be 
rewritten into the draft. Intermediate versions are all mapped to states of the 
first draft, in order to avoid a multiplication of copies and representations. Only 
the final copy is given as a representation in its own right. 

The figure records how information is transferred from one representation to 
the next by the working steps. The abacus rods - the representations - carry 
beads which represent information items. A bead receives its number from the 
working step that deals with it. Bead 1, for instance, is the information pro- 
cessed by working step 1. It is transferred from the source document to the 
document surface representation and not further. This means that it has not 
been submitted to relevance assessment or that it has been found irrelevant. In 
contrast to bead 1, the information processed by working step 2 (i.e., bead 2) 
has not only made its way through the relevance assessment into the document 
knowledge representation but it has also penetrated straight into the draft 
summary. Working steps may copy the items processed and deposited in mem- 
ory by other steps and re-use them under their own number. For example, step 
4 copies the result of step 3 and reworks it under its own trademark. Working 
steps that are missing do not contribute to knowledge processing. Normally 
they are dedicated to planning or other metacognitive activities or to socializ- 
ing. 

The connection between the overall knowledge processing diagram and the 
presentation of individual working steps can be checked by looking up any 
working step in the process diagram. In the overview, arrows mark steps which 
are discussed in detail (compare Figs. 4.34 and 4.62). 




Fig. 4.26. Illustrative segment from a process diagram 
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4.3.6 Real-world summarizing steps and sequences 

4.3.6.1 Working step Judge-3: ^‘Let me see what the article is about” 

Summarizer: Harold Borko, Los Angeles, CA 

In the step Judge-3, Harold demonstrates in a picture-book style the intellectual behav- 
ior of the abstractor who right at the outset establishes the first connection between the 
title and the document text. In this way he learns about the article's subject 
("aboutness") and begins the expansion of the title information by linking units of 
meaning from the text to the theme through semantic relations. 

In the two preceding working steps, Harold read the title of the article by An- 
thony Judge, Representation of sets - the role of number, and discovered that the 
article begins with an outline listing the chapter headings. 

Starting, control, and planning. Harold now begins to explore the text: “Let 
me see what the article is about” (compare protocol segment in Fig. 4.27). 
With this polyfunctional statement, he not only gives himself a start signal and 
commences with the first working step but also determines what he intends to 
do next and poses the typical abstractor’s aboutness question at the beginning 
of a working process. A group of different strategies reveal their activity in this 
statement: 

• start-explore begins the text exploration sequence, i.e., a series of working 
steps dedicated to information seeking in the source document 

• explore controls an exploration step, here the first one of the sequence 

• via the strategy plan Harold states what he intends to do next, namely find 
the theme of the paper 

• question is the strategy with which the summarizer asks his standard ques- 
tions, among them the well-known question “what is the document about?” 

The four strategies initiate document exploration. Their aim is fixed, namely to 
come up with more information about the subject of the article. 

Exploration. From the external input window and the protocol segment, the 
reader sees that Harold indeed starts to read. Reading includes first of all an 
operation which copies information from an external source into a memory 
area. This function is executed by the read-form strategy (see Fig. 4.27). It ac- 
counts for plain reading. However, this is only a small part of an expert sum- 
marizer’ s exploration behavior. At a higher strategic level, Harold demonstrates 
information acquisition intentions when exploring the beginning of the 
Representation of sets article. He applies an open acquisition attitude, accept- 
ing for understanding what the text offers (strategy browse). 
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Fig. 4.27. Working step Judge-3: “Let me see what the article is about’ 
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Attention is high because the summarizer is aware that he is at the beginning 
of the document, where the search for thematic statements is often fruitful 
(strategy first - explore beginnings of units). 

Indeed he finds ‘‘in the very beginning” of the document a topic sentence in- 
troduced by a promising key phrase: “This paper is concerned with the wide- 
spread tendency to present arguments, or to formulate insights or conclusions, 
in a series of points.” This is a typical constellation. 

Relevance assessment and expansion of the theme. Meaning items that be- 
long to the thematic structure of a document must be relevant. Their relevance 
is almost always supported by a couple of strategies. We engage a specific 
strategy to check for every standard feature that conveys relevance to a state- 
ment. These strategies may argue from divergent viewpoints, but they always 
include a strategy that judges closeness either to the thematic structure or at 
least to an outline item. If the relevance assessment strategies agree on the 
importance of the statement under question, it is encoded by the hold strategy 
in the theme and / or scheme representation of the document, linking it with an 
appropriate semantic relation to the preexisting thematic structure. 

In the present case, Harold can fall back on four strategies for determining 
the relevance of 

This paper is concerned with the widespread tendency to present arguments, or 
to formulate insights or conclusions, in a series of points. 

• relevant-unit reacts to the position of the statement at the immediate be- 
ginning of the text. What the author has placed there is often a topic sen- 
tence. 

• relevant-texthint recognizes the indicator phrase “this paper is concerned 
with” which clearly hints at an important statement. 

• relevant-call manages to relate the interesting statement to the thematic 
core by installing a restatement link. 

• relevant-topic-sentence insists on the relevance of topic sentences found in 
the document. They can be identified from the author’s indicator phrase 
(“this paper is concerned with”) and from the successful binding to the 
thematic structure by relevant-calL 

As a result of relevance assessment and encoding, the thematic structure 
known to the summarizer has been expanded (see the upper and the lower 
theme window of Fig. 4.27). 

“This is his sets”, Harold notes. He has detected the restatement relation that 
links the first statement of the paper to the paper’s title. Then he clarifies this 
by means of a conclusion of his own (strategy inference), formulates the im- 
portant statement in his own words (strategy relevant) and underlines the core 
of the respective text passage in the original (strategy underline - see external 
output window). 
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43.6.2 The Mackin sequence: Discovering the theme 
and writing the topic sentence 

Summarizer: Edward Cremmins, Rockville, MD 

In the Mackin sequence, we discuss the steps 1, 4, 5, 7, and 13 from a summa- 
rizing process comprising 104 steps, dealing with a remarkably well-organized 
paper about intercultural technical communication, written by John Mackin: 
Surmounting the barrier between Japanese and English technical documents. The 
main point of interest is the way the summarizer finds the theme of the paper. 
The reader can observe the growth and consolidation of the thematic structure 
until, in step 13, Edward drafts the topic sentence of the abstract, after the due 
informational safety checks. Edward habitually ignores the title of a document 
and tries to find out on his own what is the topic of the document, starting out 
with no preconceived ideas. Thus the reader can observe how a summarizer ob- 
tains the theme information quickly, using exclusively data from the text itself 
This inductive approach contrasts with the top-down extension of the title de- 
monstrated by Harold Borko in the Judge-3 step. 

Overview of the presented working steps. We discuss the following working 
steps in detail: 

• Mackin-1. The summarizer is developing a first guess about the topic 
without using the title. 

• Mackin-4 brings the discovery of the document theme. 

• Mackin-5 shows Edward expanding his thematic knowledge. 

• Mackin-7 demonstrates an informational check of the theme knowledge 
against the outline. 

• Mackin-1 3. Edward drafts the topic sentence of the abstract. 

In our presentation, we skip steps 2 and 3, where Ed goes on reading the next 
paragraphs without finding anything noteworthy. We show only one exemplary 
working step of informational safety checking - step 7 - and omit the others 
(steps 6, 8-11). In step 12 (skipped) the summarizer decides to start writing 
and plans how to go about it. 

In the exploration steps 1, 4, and 5 we must examine above all the relevance 
judgements in order to understand how the summarizer constructs his represen- 
tation of the theme. We occasionally also comment upon document exploration 
strategies in order to show their contribution to the discovery of the thematic 
structure. In step 7 we see strategies that compare and check text parts that 
should contain analogous information. Step 13 is devoted to production. The 
reader can observe target text construction and convince her- or himself that 
the theme representation built up in the previous steps actually works well as a 
basis for the topic sentence of the summary. 
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Note that in the Mackin exhibits, underlined passages mean passages under- 
lined by the summarizer and not what he has read (as elsewhere in this book). 
Passages that have been read are printed in boldface instead. 



Working step Mackin-1: Starting into a new document 



Working step 1 shows how the summarizer approaches the document with his general 
methods background and finds a potential theme paraphrase. 

At the Start of a working process, the summarizer is equipped with methods 
knowledge, but he knows nothing about the wording or the topic of the docu- 
ment. A summarizer’ s professional knowledge of document types and methods 
avoids any bewilderment. Even when Ed playfully disregards the title informa- 
tion proper (see below), he still gets top-down guidance from the current out- 
line item. 

The initial state of poor information is illustrated by the text representations 
at the top of the working step exhibit. They are prestructured by expectations 
but empty. In contrast, the representations at the bottom of the exhibit are filled 
with the details that have been acquired during the first working step. In 
particular, the theme representation contains an initial entry. 

Initialization. Since in the current working step Edward begins a new summa- 
rizing process and does not yet know anything about the document, he must 
follow a standard initialization procedure. As a rule, articles are known to be- 
gin with introductions. Consequently, Ed looks for an introduction when ap- 
proaching the beginning of the article. In spite of the missing heading 
“Introduction”, he subsumes the material he finds (“reading introductory mate- 
rial” - see protocol segment in Fig. 4.28) under the corresponding label intro- 
duction proposed by the general structure of journal papers (strategy label). 
This allows him, by the way, to situate himself in the working plan and to an- 
nounce the next activities (strategy plan). After these preparatory activities, Ed 
can start document exploration. 

Exploration. In almost every exploration act, a professional summarizer de- 
cides anew what (s)he wants to learn from the document. In the present case, 
Edward can rely on his document structure knowledge (strategy by-form) and 
choose the introduction as a starting point for his document exploration. This is 
a good choice because by looking at the introduction, the summarizer realizes 
simultaneously a second intention of skilled information acquisition: he keeps 
to a high-level text component such as an introduction or a conclusion (in con- 
trast to body text - strategy top-level). 

As he starts reading at the beginning of the paper, Ed intensifies attention 
(strategy first - pay special attention to beginnings). He expects the authors to 
explain to their readers right at the beginning what they intend to do. Edward 
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restricts his information intake to digestable units given by the input (strategy 
unit), as it is good default policy. During his reading, he applies the standard 
reading intention for complete understanding as defined by the strategy browse. 
As soon as the summarizer’s exploration attitude is defined, relatively simple 
optical reading strategies can provide the desired input from the source docu- 
ment (strategies read-form and read). 

Relevance assessment by text-driven strategies. We assume that the whole 
paragraph has been copied into the document surface representation. It contains 
in particular “One of the biggest difficulties is that of communicating effec- 
tively between cultures”, a statement which will emerge as a first hypothesis 
of the paper’s subject. The passage is submitted to the relevance strategies. If 
they can agree upon its importance, it is encoded in the scheme and / or 
theme representation and kept for later use. 

Their argumentation looks as follows: 

• relevant-unit accepts all items as potentially relevant that have their place 
at the beginning or the end of a text unit. It distinguishes levels of textual 
organization. In the present case, it scores twice: the whole paragraph is 
relevant because it begins a macrounit (the paper and its introduction), 
and because inside the paragraph first and last items are more relevant 
than the middle. Thus the statement “One of the biggest...” is judged posi- 
tively. 

• relevant-scheme accepts every statement as potentially relevant which is 
in the range of an outline item and can therefore be linked to it. The way 
relevant-scheme argues is shallow, but the strategy is practically useful, 
especially when the summarizer does not want to or cannot invest more 
cognitive work. Since the first paragraph elaborates the “introduction”, the 
paragraph is evaluated positively. 

• relevant-texthint knows that authors’ indicator phrases mark important 
statements and listens to them. “One of the biggest difficulties is” is such 
an indicator phrase. It hints at its scope “communicating effectively bet- 
ween cultures”, the possible theme phrase. 

The most supported statement is “communicating effectively between cul- 
tures”. The strategy relevant-texthint (or the author himself with his indicator 
phrase) gives the decisive hint. Edward has found his first possible formulation 
of the topic. He underlines the respective passage in the original paper and thus 
confirms the analysis given here. Since relevant-texthint cannot propose a spe- 
cific thematic relation in the theme representation, the hold strategy attaches 
the new item by a candidate restatement relation to the root of the theme. 
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Fig. 4.28. Working step Mackin-1: Starting into a new document 
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In the document scheme representation at the bottom of the exhibit, a heading 
“introductory material” has shown up, and the statement “big difficulty: com- 
municating effectively between cultures” has been filled in. “Introductory ma- 
terial” is Ed’s own formulation for the default entry “introduction” from his 
general knowledge about document structures (see the initial state of the 
document scheme at the top of Fig. 4.28). The statement “big difficulty: com- 
municating effectively between cultures” has been attached to the current out- 
line item “introductory material”. 

From the document theme window we learn that Ed has a first guess about 
the document’s topic. The sentence he underlines (see the external output 
window) is his current best paraphrase of the document theme. Whereas he can 
safely link the meaning item “big difficulty: communicating effectively be- 
tween cultures” to the introduction (see the document scheme representation) 
because he found it there, he can only assume that the quoted phrase really re- 
phrases the unknown topic. This is indicated by question marks in the docu- 
ment theme window. 



Working step Mackin-^: Ed recognizes the document theme 



As early as during his fourth working step, Ed finds out what the document is about. The 
author announces his theme clearly: *Tn this article, I use ... ”, and Ed reacts by underlin- 
ing the whole topic statement. Whereas Ed has made only a provisional guess as to the 
document theme after his first working step, he now obtains positive knowledge. What 
he discovers is the author’s short summary of the article. It will be used in the abstract. 

While the exploration technique brings no new features, the way the relevance 
assessment strategies identify and exploit the author’s summary merits the 
reader’s attention. 

Relevance assessment and recognition of the article's theme. Fortunately, 
the author is more than explicit about what he is going to present to his readers. 
Ed learns from two broad textual hints (“In this article I use ...” ... “I conclude 
by ...”) that he is reading relevant information. Statements of this sort are good 
old friends to every professional summarizer. 

relevant-texthint provides established interpretations for current indicator 
phrases. As the author points out so nicely that he states his paper’s organiza- 
tion (“I use ...” “I conclude by” - interpreted by the strategy relevant-texthint), 
Ed can recognize that he is dealing with thematic information and he can even 
allocate a part of the topic sentence (“some solutions ...”) to the right outline 
heading, the conclusion, and the remainder vaguely to the middle part of the 
paper. 
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Fig. 4.29. Working step Mackin-4: Ed recognizes the document theme 
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relevant-scheme compares an input unit with the available document structure 
knowledge and attaches the item to the outline (the scheme representation) if 
possible, always using an elaboration relation. “In this article I use the 
problems involved in communicating ...” can be attached to the middle part of 
the paper, where quite normally the author deals with a problem, while “I con- 
clude by posing some solutions...” announces the content of the conclusion. 

relevant-call tries to link the meaning item to the thematic structure, testing 
all available relations. In the present case, the restatement and the elaboration 
relations work. A restatement is a synonymous formulation of the same con- 
cept. An elaboration is a restatement with an added semantic component which 
enlarges our knowledge. 

Finding a restatement relation involves a knowledge base and some inferenc- 
ing (see Sect. 4.3.4 for more formal discussion). Here, 

big difficulty: communicating effectively between cultures 

the current best formulation of the theme, reappears in the text passage empha- 
sized by the author three times with slight variations in wording and content: 

problems involved in communicating across the barrier of Japanese language 
and culture into English 

difficulties that may be inherent in any cross-cultural translation 
the communication barriers international companies face 

The last example appears to be a less complete paraphrase of the hypothetical 
theme. It states the difficulty of communication (“communication barriers”), 
but factual knowledge is necessary to imagine that international companies 
have communication problems which are as international as the company, i.e., 
communication problems between different cultures. 

Since all text items under consideration are embedded into additional 
material, they also qualify as elaborations of the theme. For instance, the 
reader learns that “problems involved ... Japanese language and culture into 
English” are considered as examples for the general communication problem 
between nations. This relationship was not known to the summarizer before. 
Other relations are also present: relevant-call can build up an examplified-by 
relation and a solutionhood relation. All relations are used to attach the 
candidate items to the theme. In the bottom theme representation of the 
working step, only a few of the most obvious relations have been entered. 

relevant-summary tries to detect summary qualities in a text passage before 
installing a summary relation to the theme. It checks if the passage under con- 
sideration has been attached to the thematic core by relevant-call with one or 
more elaboration relations. Then relevant-summary tests whether the candidate 
comprises several propositions that reasonably cover the whole document. If 
they do, relevant-scheme has been able to attach them to outline positions. The 
feature of “textualized outline” makes an in-text summary differ from a topic 
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sentence. The strategy relevant-summary attaches the whole passage to the 
theme representation, using a summary relation. 

The result. The thematic structure has gained both in reliability and content, 
although the theme is still known from restatements only. But Ed now has two 
restatements of the unknown topic, and the newly acquired one is highly 
credible because of the author’s heavy emphasis. Thus, both restatement rela- 
tions find themselves consolidated to valid links (see lower document theme 
window). The new topic paraphrase is organized by RST-style semantic rela- 
tions and reads: 

the problems involved in communicating across the barrier of the Japanese lan- 
guage and culture into English as a general example of difficulties that may be 
inherent in any cross-cultural translation. ... some solutions to the communi- 
cation barriers international companies face 



Working step Mackin-5: Ongoing expansion of the thematic structure 



Ed detects how the author intends to organize the core of his article: 

Communication can fail in three ways: wrong content, wrong structure, or wrong pre- 
sentation. I now examine each of these problems in turn. For each of these problem 
areas, I introduce some significant articles... 

The summarizer adds the newly discovered theme knowledge to the existing structure. 
The basic intellectual activity is the same as before: Ed agglomerates knowledge from 
the text to the mental representation of the document in his memory. 

Almost the same strategies appear as in the former step (compare Fig. 4.30). 
There is nothing wrong in this observation. An expert summarizer’ s routine re- 
lies on elaborate intellectual tools with a high reuse frequency. 

Our discussion deals with relevance assessment. It begins as soon as the next 
interesting text passage is available in the document surface representation. 

• relevant-unit. The passage under consideration occurs right at the be- 
ginning of the paragraph. There it is likely to catch the reader’s and sum- 
marizer’ s attention. The strategy relevant-unit exploits text position infor- 
mation. It recommends the passage as possibly relevant. 

• relevant-scheme recognizes that the passage elaborates (adds knowledge 
to) the document scheme representation. From general document structure 
it was known that the current paper must have a middle part, but it was 
open what this core of the paper looks like. Now the structure of the pa- 
per’s body is known to be tripartite and to deal with content, structure, and 
presentation problems. The respective propositions can be entered into the 
scheme representation. 
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• relevant-texthint. Strong verbal clues are provided by the author. The 
strategy relevant-texthint deals with them. ‘T now examine each of these 
problems in turn. For each ... I introduce ...’' is clearly a double indicator 
phrase for a summary and a description of the paper’s outline. Ed is on his 
guard and underlines the passage. Just as a summarizer has a knowledge 
base of interpreted indicator phrases in his mind, the strategy relevant- 
texthint has at its disposal a knowledge base (or dictionary) of indicator 
phrases. It lists not only the wordings of such phrases, but also their uses. “I 
examine ... in turn” and “I introduce articles” are characterized there as to 
their function by expressions saying something like “the author lists his 
outline points” and “the author announces a review of papers”. 

• relevant-call tests whether the propositions gleaned from the text passage 
hook up to the document theme by any available semantic relation. In the 
present case, the cause-result link works: communication difficulties can 
result from wrong content items as well as from wrong structure and wrong 
presentation. The propositions can be attached to the theme representation 
by causal relations. 

• The relevant-summary strategy is specialized in the assessment of text 
passages as summaries. The newly added information complex justifies a 
restatement link, because the concept communication problems appears 
both in the old thematic structure and the candidate propositions. The pro- 
positions add new information about the cause or effect of the commu- 
nication problems. Thus they comply with the demands of an elaboration 
link. The author’s double indicator phrase makes perfectly clear that the 
propositions about wrong content, wrong structure and wrong presentation 
state the paper’s outline. Thus the conditions of an in-text summary are 
met. The summary relation can be established. 

The outcome. The hold strategy enters the newly acquired text knowledge into 
the document scheme and theme representation. The information from the cur- 
rent text passage is safely anchored to the document representation by several 
relations. 

Ed now knows enough to write the topic sentence of the abstract. For the 
sake of simplicity, only the cause-result relation has been entered in the 
graphic representation of the working step. The scheme knowledge has been 
extended as well. 




4.3 An empirical cognitve model 111 



Working step Mackin-7: Checking the theme against the outline 



Only in a superficial interpretation does the exploration of the paper's outline appear as 
the main activity in working step 7. On closer inspection, we find the strategic docu- 
ment user Edward checking his current understanding of the paper's subject by compar- 
ing it to the outline. 

The main goal of the working step is expressed by the leading strategy check- 
inform (check information whenever appropriate - see Fig. 4.31). It belongs to 
those general strategies which can be called upon at any time, interrupting the 
work of other strategies. 

Without the need to read much material, simply by combining and compar- 
ing different formulations of the article’s theme, the summarizer can establish 
enough information safety to write. This works because of text constraints that 
hold between outline, summary and document: 

• What an outline item (a heading) promises, must be in the body text 
(strategy head-in-text). 

• What a summary says, must figure in the body text (strategy sum-in-text). 

Two short presentations of the thematic core, such as an in-text summary and 
the outline, may support each other by their agreement. 

Before informational checking can begin, Edward must explore the outline of 
the paper and recognize it as relevant. 

Exploration of the outline. In the Mackin paper the outline remains integrated 
in the text body and is given by the headings of text components (see exhibit). 
Since the summarizer is not interested in a full exploration of the document, he 
must be able to restrict his attention to the headings, disregarding anything 
else. To achieve this behavior, the explore strategy calls upon outline percep- 
tion strategies. Exploration intentions such as looking at headings are defined 
first. For practical reasons, they are mapped to a layout view given by the skim 
strategy. The external reading strategies use layout features (e.g., highlighting) 
in order to find and copy the desired units. 

Strategies installing perceptual intentions influence exploration in the follow- 
ing way: 

• ex-form. Whereas mostly the body text is at the focus of a reader’s interest, 
this is precisely not what the summarizer Edward wants to see at the mo- 
ment. He explores the formal document organization instead (strategy ex- 
form - explore the formal document organization). 

• heading. The summarizer currently looks for headings of text passages. 
This intention is described by the heading strategy. In the Mackin paper, 
the outline is presented in the form of text-integrated headings. 

• unit. The unit strategy restricts perception to limited information units 
(paragraphs, headings) at a time, as opposed to lengthy text stretches. 
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Fig. 4.31. Working step Mackin-7: Checking the theme against the outline 
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Exploiting the (optical) layout view of an outline. The outline information 
can easily be recognized by its layout features on a printed page, i.e., by bold- 
face and some isolation from body text. An optical skim strategy which selects 
highlighted items brings the right results. Its advantage is to use visual features 
which the external reading strategies can apply directly. 

External reading. Two external reading strategies suffice to implement the 
skim strategy: read-form gets fonts and their layout, read-over blocks out long 
passages that Edward does not want to know by delivering a shallow place- 
holder representation in the style of gray blocks without interpretation. 

As a result of information acquisition, the newly read part of the outline is 
available in the document surface representation. 

Attaching new outline information to the already known. Only relevant in- 
formation enters the scheme and theme representation. In the present case, 
only the scheme representation is filled up with new data (see Fig. 4.31, bot- 
tom scheme window). In the scheme, the attachment of new items is straight- 
forward: 

• Headings (sometimes also catchwords and other items) can be identified 
on the page by their layout features. They have been emphasized (by font, 
isolated position, color, or whatever optical means) because the author 
wants readers to notice them. A relevance strategy called relevant-form- 
hint states that highlighting by layout features confers relevance to an item. 

• As they carry precious information about the document structure, entries in 
the document scheme are by definition important to a summarizer. More 
specific entries in the outline specify global entries. Between global and 
more specific headings, an elaboration relation holds, relevant-scheme 
makes use of this relation by attaching new outline items with an elabora- 
tion link to the existing scheme representation. 

After exploration and relevance assessment, the document scheme represents a 
fair part of the document outline. The informational basis for information 
checking, the main activity of the working step, is now set. 

Information checking. Information checking (strategy check-inform) works 
with the text constraints of the strategies sum-in-text and head-in-text. Accord- 
ing to them, in-text summaries and headings give valid information about the 
document. If two different sources about the document theme agree, their con- 
tent is the more reliable. In the current working step, Edward has gleaned his 
first reading from body text, the second comes from the outline. 

The strategy compare matches entries from both representations (see Fig. 
4.32). The comparison of theme and scheme representation confirms the 
tripartite organization of the theme according to content, structure, and presen- 
tation. But how can one recognize that it does not matter whether content. 
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Structure, and presentation are characterized as being wrong (theme represen- 
tation) or appropriate (scheme representation)? An inference strategy of 
general knowledge processing is called upon. It looks at the knowledge base 
and sees a suitable rule record that tells it to match the negative and the 
positive expression of a problem. Thus it is able to impose an equivalence bet- 
ween both aspects. 
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Fig. 4.32. Consolidating the interpretation by comparison of different sources 



The “difficulty of effective intercultural communication” from the theme repre- 
sentation appears in the outline scheme under the wording “translation 
problem, why translation seldom succeeds” and “what can we do to success- 
fully cross the barrier”. Because paraphrases are used to state the problem, 
some knowledge-based inferencing is needed before the formulations from the 
in-text summary and from the outline can be matched by the compare strategy. 
After that, they support each other. Edward can feel safer about his interpreta- 
tion of the paper. The check-inform strategy has succeeded in consolidating 
information. 



Working step Mackin-13: Writing the topic sentence 



Ed drafts the topic sentence of his abstract. He draws it from the theme representation 
whose construction we have observed in the preceding steps of the sequence. 

We are confronted with a quite typical text production act during professional 
summarizing. What happens is best characterized as skilled copying of se- 
lected material (“cutting and pasting”) from the source text. 






4.3 An empirical cognitve model 181 



When Ed writes the first sentence of his abstract, he knows the core outline 
and thematic structure of the paper and in principle also the wording of the 
important document passages (see Fig. 4.33). In addition, he looks at the first 
page of the paper and concentrates on the passages which he has underlined. In 
the external input, the gray zone represents the text area Ed looks at. This con- 
tains the passages which he underlined and whose wording he may want to re- 
cover. When Ed concentrates on the underlined passages, he may also perceive 
immediately surrounding text areas such as the phrase “I now examine each of 
these problems in turn”. 

Most interesting in our current context is how the theme representation is 
presented as a topic sentence. This entails that core semantic material must be 
selected, subsumed under the semantic form of a topic sentence, linearized 
and equipped with a linguistic surface. Readers who want to study the whole 
step in detail are referred to the SimSum simulation. The general introduction 
to summary production by cutting and pasting in Sect. 4.3.4 may also be worth 
looking at. 

By comparing the theme representation and the output topic sentence, we 
learn that Ed has used the outer effects (or causes) of bad communication, i.e., 
flaws in content, structure, and presentation of materials, then the example the 
author dwells upon, namely the Japanese-English communication barriers, and 
last the general context of cross-cultural translation. In the following, we track 
the production of the written topic sentence. The first subtask is to configure 
the semantic text plan, the second to linearize and express it. 

Planning the topic sentence. We take for granted that the construct strategy 
manages the current text construction step (see Fig. 4.33). Its first subtask is to 
obtain a semantic plan for the topic sentence. In general, the construct-plan 
strategy builds content plans for sentences, but topic sentences have a special 
form, they occur frequently, and they are particularly important. So a special- 
ized strategy named topic provides the plans of topic sentences. It is supposed 
to choose them from a small knowledge base and to accompany the standard 
semantic format with a standard formulation such as “x is examined for y”. 

In the present case, topic comes up with a standard semantic format that 
names the phenomenon of concern, puts it into a larger framework (an applica- 
tion context, a domain of science, or something else) and characterizes its ef- 
fects or impact. 

Ed Constructs his topic sentence from this semantic structure by entering data 
from the thematic representation. He starts from the main object, the problem 
or phenomenon. It is instantiated with the English-! apanese communication 
problems already well known to the reader. As required by the topic sentence 
format, they are put into a larger framework, here intercultural communication 
in general. 
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Then the impact of the main problem or phenomenon, its applications, causes 
or effects are stated. Again using material from the theme representation, 
Edward fills in what can go wrong: content, structure, and presentation. Now 
the core material of the theme representation has been integrated into a valid 
pattern for topic sentences. The strategies topic and construct-plan have suc- 
ceeded. The semantic plan of the topic sentence can be handed over to the 
formulation strategies. 

Formulation. The next important text production task is to construct a linguis- 
tic realization for the semantic sentence plan. The formulation strategies take 
care of this problem. From the thinking-aloud protocol we see that neither line- 
arization nor finding the right words (lexicalization) is easy in the present case. 
For this reason, the formulation strategy is accompanied by form-increment 
which makes it work incrementally, i.e., undertake as many approximations as 
necessary to solve a problem. 

Linearization needs two tries. Ed gives up his first attempt to start the topic 
sentence with “communication”, i.e., with the general point of the author. He 
cancels his first letters (see exhibit, especially the thinking-aloud protocol and 
the external output), and decides to bring the manifest causes of the communi- 
cation problems first: “The effects of wrong content...”. 

In the case of a topic sentence, formulation is eased by the skeleton that 
came with the topic sentence format from the knowledge base of the topic 
strategy. Something like “The consequences of x on y are examined for their 
general implications on z” is provided. 

Since professional summarizers often use standard formulations to express 
common semantic patterns, we have a general pattern strategy equipped with a 
dictionary of wordings for the well-known semantic formats. It contributes the 
sentence shells and ready-made formulations from the source paper to fill them 
out. 

Lexicalizing the concepts of the theme representation is accomplished 
through borrowing their wording from the author. Ed recovers the precise formu- 
lation by looking at the passages he underlined (strategies refresh, explore, 
marked, and read-form). The strategy reorganize integrates the different 
pieces of text into a regular sentence. Ed slightly reorganizes communicating 
into communication and adjoins effectively taken from the very first theme 
paraphrase, after changing it from adverb to adjective. At the end, the 
construct strategy checks the result. It has a missing lexicalization which is 
beyond repair for the moment, but the remainder is ready for writing. 
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4.3.6.3 The Trueby sequence of online abstracting 

Summarizer: Marliese Gunther, FIZ Karlsruhe, Germany 

The following segment from the Trueby sequence demonstrates what the sum- 
marizer herself calls online abstracting: she reads the document or a relevant 
section of it in one go and writes down the information which then forms the 
abstract. She needs to revise very little, and on the whole she does it right the 
first time. Her straightforward technique sounds simple at first; however, it im- 
plies that she is able to write a passage of a summary before knowing most of 
the text to be summarized. By performing as she does, Marliese contradicts the 
assumption that it is necessary to understand a document in order to summarize 
it. Instead, an expert needs to have understood at any moment just enough to 
produce reliable target information. 

Online abstracting. Online abstracting is most impressive when the abstract- 
ing process is carried out on an almost unknown document. In the selected five 
working steps Marliese begins to process an article by P. Trueby and H. W. 
Zoettl about the health of trees under heavy metal deposition. The five-step se- 
quence illustrates the principle of Marliese‘s online abstracting clearly: she 
moves sequentially through the document and extracts useful information from 
it for the abstract until the abstract is filled or no more important new informa- 
tion is forthcoming. Practically speaking, she drops irrelevant information and 
either simply copies relevant passages from the original or reformulates them. 
Marliese begins to produce target text on the spot: 

Step L She reads the title. 

Step 2. She writes the topic sentence of the abstract. 

Step 3. She formulates the second sentence of the abstract. 

Step 4, She declines information for the first time. 

Step 5. She rejects another passage. 

Metacognitive activities. Metacognitive monitoring and steering does not ap- 
pear all the time inside working steps. Often, steering comes from behind the 
scenes. One step after the other is executed. The leading strategy simply links 
the step to the overall working plan. 

Marliese is a person who vocalizes her pushing through the working steps 
more than others. Her volitional effort is expressed in the thinking-aloud proto- 
col by phrases such as go on (in German weiter) or related expressions as The 
text goes on. The volitional effort that keeps the person going to the next sub- 
task as soon as the former is completed is attributed to the next strategy. It can 
be observed in the steps Trueby-3 and Trueby-4 (Figs. 4.37 - 4.38). 
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Fig. 4.34. The Trueby abstracting process 
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The production strategy that presses for immediate production of written output 
is also a metacognitive strategy. It transmits the summarizer’s will to the local 
leading strategy in a step. It is at work in steps 2 and 3. 

The overall process. The reader is invited to identify the five sample working 
steps in the overall process diagram presented in Fig. 4.34. The interesting area 
at the top left is marked by arrows in the margin. A general explanation of 
process diagrams is given above in Sect. 4.3.5. The steps themselves are listed 
above. Therefore they are easily recognized in a different presentational form: 

• During all five steps, Marliese is busy extracting information from the 
source article. Her work starts out from the source representation. 

• Step one writes into the document representation. This means that it has 
brought about relevant information, namely the title. 

• Steps two and three, in addition, enter information into the draft summary: 
Marliese is already producing her abstract. 

• The information acquired by steps four and five is stopped at the document 
surface representation. This is due to lack of relevance. 

Reading the diagram beyond the marked area, one can follow Marliese step- 
ping through the document, dropping uninteresting passages or picking up state- 
ments for the abstract and writing them down. This is her activity during work- 
ing steps 1-50. Her behavior is regular with very few exceptions. After her pas- 
sage through the document, the draft summary is all prepared. She revises it 
slightly during steps 53-58. 

The organization of the Trueby process is remarkably simple, whereas the in- 
tellectual performance of the summarizer is not. Marliese is modular in her 
approach. She must be able to definitively treat every information item right at 
the moment she learns it from the original paper, in particular to assess its 
value for the abstract. This presupposes an expertise in the field that safeguards 
the summarizer from problems of understanding and evaluation. Online ab- 
stracting is observed less frequently because it is difficult. 



Working step Trueby-1: Reading the title 



Since Marliese begins a new working process in working step Trueby-1, she invests 
some effort in starting and initializing activity before she comes to grips with her pa- 
per. Thus we find two blocks of activity: 

• starting up 

• exploration 

Starting up. Switching from a period of rest to a working process needs some 
metacognitive activity or will (strategy start). Marliese states her intention 
(strategy plan), namely to report a lecture. We can assume that she activates 
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her working plan for abstracting processes. In doing so, she also determines the 
document type. This is done by the strategy state-doctype. She prepares her 
mind for document exploitation (strategy start-explore). 

Readers may compare Marliese’s start procedure with that of Harold Borko 
in working step Judge-3 (Fig. 4.27) and that of Edward Cremmins in Mackin-1 
(Fig. 4.28). 

Processing the title. The document exploration activity is dominated by the 
strategy explore (see Fig. 4.35). Marliese needs no complicated reading inten- 
tions. She is guided by the document structure of a lecture (strategy by-form), 
and she accepts input in units as presented by the paper (strategy unit) for 
normal reading and understanding (strategy browse). The strategy read-form 
copies the title information into memory (see document surface window). Now 
the hold strategy makes the relevance assessment strategies decide whether 
something in the title is worth keeping for the summary. The striking argument 
is that the title is always relevant (strategy relevant-title). The hold strategy 
copies the title information into both the scheme representation and the theme 
representation of the source document. The working step is finished. 



Working step Trueby-2: Writing the topic sentence 



In the second working step of the Trueby sequence, the topic sentence of the abstract is 
written. 

As is often the case (compare working step Judge-3 - Fig. 4.27), the original 
document contains a relevant statement right at the beginning, since the main 
purpose of the introduction is to pave the reader’s way to understanding. Mar- 
liese finds the material for her topic sentence. 

Marliese switches to her online mode. This intellectual move is modeled by 
having the production strategy influence exploration and try an immediate ex- 
ploitation of newly read information for writing (see Fig. 4.36). As a conse- 
quence, explore has attached a construct subnetwork dedicated to target text 
production (see the active strategies in Fig. 4.36). 

Document exploration. Marliese orients herself according to the established 
document structure (strategy by-form). She takes care to remain in the top- 
level document parts, i.e., in the introduction (strategy top-level). She advances 
by units, allowing the document to dictate the next one, a paragraph (strategy 
unit). Reading is plain, accepting input for normal understanding without re- 
strictions (strategy browse). The optical reading strategy read-form copies the 
input to memory. 
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Fig. 4.35. Working step Trueby-1: Reading the title 
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Fig. 4.36. Working step Trueby-2: Writing the topic sentence 
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The information acquisition configuration described above is common. The 
reader may compare the current step to step Goonatilake-66 (Fig. 4.52) where 
exploration accurately uses the same strategies, or to Mackin-4 (Fig. 4.29), 
where the guidance by the document structure (strategy by-form) is missing, 
but unit, top-level and browse are at work as well. 

Relevance assessment: positive relevance strategies. Four relevance assess- 
ment strategies cooperate to determine the relevance of what Marliese retains 
from her input. They argue as follows: 

• The position at the very beginning of the text and at the beginning and end 
of the first paragraph indicates the relevance of a statement (strategy 
relevant-unit), 

• The passage is introduced by a keyphrase which announces a core state- 
ment: “This study forms part of a project At the end of the paragraph 
this is underlined by the phrase “in the following ... is reported about.’’, 
which marks the theme of the paper as clearly as one could wish. The cue 
phrase is exploited by the strategy relevant-texthint. 

• Since Marliese already knows the title, she can also see that the statement 
in the first paragraph links up directly to the title (strategy relevant-call) 
with a restatement relation: both topic sentences are concerned with the 
amount of heavy metal deposits in forest trees. 

• Combining the indicator phrase discovered by relevant-texthint and the re- 
statement relation installed by relevant-call, the strategy relevant-topic- 
sentence can decide that the text passage is a topic sentence. Topic sen- 
tences are highly relevant. 

Marliese explicitly states that she has found relevant material (“I think it is 
important to mention...” - strategy relevant). 

Negative strategies of relevance assessment. The current working step also 
presents us with two strategies which assess meaning items as not relevant. 
Marliese ignores the middle section of the paragraph, which further explains 
the aim of the project and its development (strategy no-comment). No-comment 
is a negative relevance strategy. It cuts back information which is not the se- 
mantic core of a statement, but extends it. Such add-on information can have 
considerable size, without being of first priority. Therefore it can be discarded. 

The second negative strategy at work is no-detail. It states that detail is dis- 
pensable, especially if it is too much for a short summary or of no use to its 
readers. With this in mind, Marliese replaces the name of the Stolberg district 
by the general term “a location”. Generalization abstracts the detail away by 
stepping up the hierarchy of a thesaurus (or ontology) and returning a more ge- 
neral concept. 
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Writing the topic sentence. The writing part of the current working step pre- 
sents itself for comparison with step Mackin-13 (Fig. 4.33), where a topic sen- 
tence is also produced. In particular, in both steps the same strategies 
{construct-plan and topic; pattern, ready-made, and reorganize) deal with text 
planning and with formulation work. 

Content planning. The semantic form of Marliese’s topic sentence names the 
subject of the lecture, anchoring it in the framework of a larger project. The 
topic strategy must have delivered a matrix such as 

Within the framework of a project x, y was analyzed 

The construct-plan strategy fills this skeleton of a topic sentence with two con- 
tent points from the paper: 

• X is given content from the project title “Atmogenous and geogenous com- 
ponents in the heavy metal balance of forest trees”. 

• y is taken from a text passage “trees influenced by high atmogenous de- 
posits in the district of Stolberg” where the name of the location has been 
abstracted away. 

Formulation. Formulation presents no special problems in the current working 
step (compare step Mackin-13 where formulation is more laborious). When for- 
mulating (strategy formulation), Marliese uses the proposal from the knowl- 
edge base of the topic strategy and adapts ready-made items from the source 
document (strategy ready-made). The pattern strategy houses the source docu- 
ment information in the standard sentence pattern to form one sentence. After 
that, the reorganize strategy integrates the bits and pieces. Marliese writes 
down her result (strategy write). 



Working step Trueby-3: Producing the second summary sentence 



Marliese continues in her instant production style and writes the next summary sen- 
tence. It explains who is responsible for the heavy metal pollution of the trees, provid- 
ing a direct extension of the topic sentence. An inter sentential link attaches the exten- 
sion to the topic sentence. 

While Marliese’ s information acquisition actitivy can be explained as in the 
previous working step, her relevance checking (see Fig. 4.37) merits attention. 
The same is true for the way the target sentence is produced. 

Judging relevance and installing a cause-result relation. The contributions 
of the strategies relevant-texthint, relevant- scheme, and relevant-call can be 
briefly stated as follows: 
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The text goes on: Locations with recent high atmogenous deposits are rare. They are often 
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- atmogenous and geogenous components 
of forest trees 




external output 

The emitters are mining plants that have been located here for many decades. 



Fig. 4.37, Working step Trueby-3: Producing the second abstract sentence 
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• relevant-texthint exploits positive hints of the authors. Because of the indi- 
cator phrase proved suitable for our study”, it draws the attention to the 
statement 

The district of Stolberg, where mining has always been carried out and where 
such plants (sc. local emitters, such as metallurgical plants or steel works) 
are located, therefore proved suitable for our study. 

• The strategy relevant-scheme attaches the passage to the current outline 
item (the introduction) and thus shallowly argues for its importance. As 
always it uses an elaboration relation. 

• relevant-call attaches the statement about the emitters to the thematic 
structure. It uses a cause-result relation. They are responsible for the pollu- 
tion around Stolberg. 

In the present case, input speaks about polluters, but does not contain any 
straightforward statement that they cause the atmogenous deposits. We only 
learn that deposits occur in the vicinity of polluting plants. Marliese must insert 
the missing causal link from her own factual knowledge. She knows that 
deposits are caused by emitters. The relevant-call strategy adapts its process- 
ing to the link which it tries to infer. When trying to set up a causal relation- 
ship it searches the knowledge base for any positive knowledge to support such 
a relation. 

When the relevant-call strategy tries to attach the emitters by a causal rela- 
tion, it must work with factual knowledge from Marliese’s knowledge base. It 
must find there causal entries of the sort: 

emitters cause deposits 
deposits are caused by emitters 

In order to identify mining plants, metallurgical plants, or steel works as emit- 
ters, the knowledge base must categorize them so, containing entries such as 

mining plant is -a emitter 
metallurgical plant is -a emitter 
steel work is -a emitter 

Since we have provided appropriate knowledge about a causal relationship, 
relevant-call succeeds and attaches the emitters to the thematic structure as 
causing pollution. 

After that, Marliese can state what she finds important enough to write 
(strategy relevant): “I’m going to mention that ... mining is carried out at this 
location...” 

Target sentence production. Marliese explains who causes the atmogenous 
deposits. This is the second sentence of her abstract. From the first idea as ut- 
tered by relevant, the sentence plan is articulated to mention the polluters and 
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to characterize them as long-standing local mining plants (strategy construct- 
plan). The meaning items are picked up from the theme representation. 

Formulation (strategy formulation) itself does not present particular problems. 
Marliese can use ready-made items from the document and a standard sen- 
tence pattern (strategies ready-made, pattern). Interesting is the formulation of 
an intersentential relation (strategy connect). By placing the phrase “The emit- 
ters are...’’ at the beginning, Marliese links her current sentence to the previous 
one. Emitters are presupposed as known to the reader from the previous sen- 
tence where they are already implied. Instead of the definite article, conjunc- 
tions or adverbs may also serve as sentence-connecting devices. 



Working step Trueby-4: Rejecting redundant information 



The step Trueby-4 demonstrates how redundant information acquired from the source pa- 
per is rejected. 

Marliese continues to read (see Fig. 4.38): “The wooded areas bordering on the 
town were subject for decades to extremely high atmogenous deposits.” This 
statement is now assessed for whether it produces anything substantial. 

No positive viewpoints come up, the negative relevance strategy once (only 
say it once, no redundancies) insists on its elimination for redundancy. “I’ve 
just mentioned that in my sentence”, Marliese says. Since the two statements 

... emitters: mining plants have been located here for decades 
... were subject for decades to extremely high atmogenous deposits 

differ in their content, a general knowledge processing inference strategy is 
called on to recognize by looking up in the knowledge base that they both as- 
sert atmogenous pollution in the interesting area. The input is not kept but ex- 
plicitly excluded (strategy exclude). 



Working step Trueby-5: No results from the work of others 



Marliese refuses a source information item because it is old. 

Marliese reads the next text passage as in the previous step and begins to con- 
sider the relevance of the newly acquired passage (see Fig. 4.39). 
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document surrace 

[This study .. 
Stolbenel 
[Locations 
soil] 



document scheme 

Heavy meial... 

1 - I nlrtxluction 

2 - Selection... 




















document theme 

heavy meial ... forest trees _ 

restatement 

- trees on a site with high atmogcnous 
deposition 

- atraogenous and geogenotis components 
...of forest trees 

caused -by 

emitters: mining plants that have been located 
here for decades 



active strategies 



next explore 




protocol segment 

1 go on reading: The wooded areas bordering the town were subject for years to extremely 
high aimogenous deposits. Tve just mentioned that in my sentence. 



document surface 

[This study .. Stolberg] 

ILocations ... soil] 

I [The wooded ... atmogenous deposits! 



0 



Fig. 4.38. Working step Trueby-4: Rejecting redundant information 
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Fig. 4.39. Working step Trueby-5: No results from the work of others 
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What she learns, however, does not correspond to the standards for information 
that can go into an abstract. While it is legitimate to include important results 
of earlier studies into a paper, second-hand information is not appropriate in an 
abstract. The negative relevance strategy own-only (only own results of the 
authors in the present paper) keeps such information out. own-only is supported 
by the inference strategy, inference looks at the citation and the publication 
years to decide that this information must be old. 

Marliese rejects the passage (strategy exclude). 



4.3.6A The Hearn sequence: How a document type-specific working plan 
is developed and applied 

Summarizer: Ingetraud Dahlberg^ E^DEKS, Frankfurt, Germany 

The Hearn sequence demonstrates document-type-specific planning of an ab- 
stracting process, namely of abstracting a review paper. As early as in the first 
working step, the summarizer notes the document type. Since a review article 
reports the research in a specific field over a given period, the work reviewed 
is of primary interest. It appears most visibly in the bibliography and in the ref- 
erences with which the text is interspersed. For lack of space in the summary, 
the summarizer does not have the slightest chance to mention even a reason- 
able choice from the normally large quantity of quite heterogeneous papers in 
the review article, with the consequence that the possibilities for action are 
limited. Inge states what she can do. She adapts her working plan and begins to 
execute it. The sequence ends with the writing of a header sentence, followed 
by a list of the section headings. It describes the type of the article, its cover- 
age (54 original contributions), and by means of the outline items, the content 
of the paper. 

In detail, the steps present the following activities: 

• Hearn-1: Knowledge activation through bibliographic data 

• Hearn-2: Counting the references. The number of reviewed papers is a key 
feature of a literature review. Since it does not figure in the article, the 
summarizer determines it. 

• Step 3 has been omitted. There, the summarizer wonders about the identity 
of the author. 

• Hearn-4: Reading the first two paragraphs of the article without finding 
useful information 

• Heam-5: Exploring the outline. The summarizer goes through the whole ar- 
ticle, reading only the chapter headings which state the outline of the re- 
port. 

• Hearn-6: Planning an abstract of a review article 

• Hearn-7: Starting to write an abstract. Inge begins her abstract stating ba- 
sic features of the review article and forming a header sentence. 

• Hearn-8: Forming a list from section headings 
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In addition, the Hearn sequence shows flexible document exploration: we can 
observe how the summarizer moves through the article. In the first working 
step, Inge reads the title information. In step 2, we find her at the end of the ar- 
ticle counting references. In working step 4, we meet her again at the begin- 
ning of the paper, now reading two paragraphs in a standard reading attitude. 
The exploitation style changes when in step 5 the summarizer runs through the 
whole article, attending only to the section headings. What she knows after 
step 5 enables her to write the overview sentence of the abstract. 

Inge applies in her target summary a large percentage of the information she 
has acquired - or, to put it differently, she does not read much more than she 
actually uses. 



Working step Hearn-1: Knowledge activation through bibliographic data 



After starting up the working process, Inge activates her background knowledge of the 
field by interpreting the title of the current article and the journal where it has appeared. 

As is natural at the beginning of a new working process, some of the strategies 
in step Hearn-1 (Fig. 4.40) are devoted to starting up the summarizing process: 
start initializes the whole process, start-explore the exploration sequence, and 
explore the current exploration act. Comparable strategy sets can be observed 
in the first steps of other sequences as well. 

In contrast with other steps, the current exploration activity is dedicated to 
core bibliographic data. What is interesting is how the information about the 
journal and the article series helps the summarizer to activate her background 
knowledge and to build up useful expectations. Knowledge activation is en- 
abled by the presence of a journal series familiar to the summarizer. Such a 
background is not always available. When Marliese Gunther interprets the title 
information of an isolated lecture in working step Trueby-1 (Fig. 4.35), she 
cannot refer to a publication environment which stimulates valid expectations 
about the lecture. 

Exploring bibliographic information and knowledge activation. Inge pays 
attention to the bibliographical information of the article: the journal where the 
article has been published, the title, and the name of the author. Among the 
strategies that define her current reading intention, we therefore find biblio - 
determine the data of bibliographic description. It states what Inge looks for. 
Besides that, Inge is guided by her knowledge about document structures 
(strategy by-form). She presumably accepts information for normal under- 
standing (strategy browse). 
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Fig. 4.40. Working step Hearn-1: Knowledge activation through bibliographic data 
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While Inge attends to the bibliographic data, more importantly she activates 
her own background knowledge (strategy background). She knows the journal 
and comparable reviews of earlier years. This helps her to decide about the 
document type “really a real review article’’ (strategy state-doctype). Subject 
analysis, the field which is reviewed, is central in Inge’s own competence. She 
can imagine what the survey should report. So she begins to summarize with 
well-prepared expectations. 

Inge appreciates the core items of bibliographic description (see thinking- 
aloud protocol in Fig. 4.40) as very useful information. She enters them into her 
mental representation of the article. The arguments for doing so are proposed 
by the respective relevance strategies: 

• the title is important (strategy relevant-title) 

• the items of the bibliographic description are relevant (strategy relevant- 
biblio) 

• the bibliographic items expand the document scheme. They can therefore 
be attached by relevant-scheme and are relevant. 

The newly acquired data have been encoded in the document scheme represen- 
tation (see lower scheme and theme windows). The title has been copied to the 
theme representation because it is the standard initial representation of the 
document theme. 



Working step Hearn--2: Some planning and counting the references 



Inge adapts her working plan to the summarization of a review. Then she takes the first 
actions, counting the number of references in the bibliography of the paper. 

Whereas in any research paper references are interesting, but not the core of 
the argumentation, in a review article the number of papers which have been 
looked at is a key feature. The review has no new and original work to report. 
Instead, the author reassesses individual contributions in the wider range of the 
whole field. The completeness of her or his overview - expressed in part by the 
number of references - is therefore a weighty quality feature. Inge needs the 
number of references for her abstract (“since we need the numbering” she 
says). She counts them. 

Besides the special interest in the coverage of a review, most descriptive 
cataloging rules foresee that drawings, tables, references, and other special 
sorts of information be listed with their numbers. Abstracting rules may do the 
same. Inge behaves in conformity with them. 

Planning. Inge is well aware how much her own knowledge contributes to her 
summarizing of the subject analysis survey at hand (strategy background). 
Then she becomes practical and plans how to go about it. We learn that she 
has a predefined working plan for review articles (see Fig. 4.41: “usually, in 
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this, I look first of all at the bibliography’’...). She applies it, adapting it to the 
current needs (strategy working-plan): she has to count the references. Imme- 
diately afterwards, she states her first move, namely to start out by inspecting 
the references (“...this I do now.” - strategy plan). 

Exploring for counting. A moment later, the working plan is indeed executed. 
We find Inge at the end of the paper, at the bibliography. She uses the outline 
to access the bibliography (strategy by-form). There, she accepts units as given 
by the source text (strategy unit), but only for a shallow perception (strategy 
overview). To count references it is necessary to recognize them but it is not 
necessary to understand them. Inge’s limited information acquisition intention 
is executed by the external readers read-form (which accepts characters with 
their layout features) and read-over which keeps recognition to a minimum. 

Inge cannot simply pick up the number of references, because the journal 
does not number them. She has instead to find their number herself by counting 
entries (strategy count) before she can memorize their number (see Fig. 4.41, 
lower document scheme window) and note it in the margin of the bibliography 
(strategy write). 

Assessing relevance and encoding. The number of references is an important 
feature of the article and therefore relevant (strategy relevant-doc-feature). It 
links to the document scheme (strategy relevant-scheme - important is what 
can attached to the outline). A product area pops up in the exhibit, because 
first material - the number of references - is allocated to the later summary. 



Working step Hearn-4: Reading the first two paragraphs of the 
article 



Working step Heam-4 demonstrates normal reading for information. 

Working step Hearn-4 (see Fig. 4.42) is easy to interpret. We now find Inge at 
the beginning of the paper’s text. Her reading preferences have switched to the 
perception of normal text. She is still guided by by-form, keeping to her out- 
line-driven exploration strategy, she still advances by text-given units (now 
paragraphs - strategy unit), but she now accepts input for normal unrestricted 
understanding (strategy browse). The fact that she pays so much attention to 
her input is motivated by her position at the beginning of the document 
(strategy first - search through the beginnings of text units). 

The simple reading strategy read-form suffices to realize Inge’s reading in- 
tentions. She finds nothing noteworthy, but she labels the passage in a superfi- 
cial way (“he starts out with the following” - strategy label). 
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Fig. 4.41. Working step Hearn-2: Planning and counting the references 












4.3 An empirical cognitive model 203 




external Input 

Thf year 1989 saw Lht; publication of new editions of a number of standard subject analysis 
tools, including the 20th edition of the Dewey Decimal CUissificationy the 12th edition of 
Library of CofiMf^ess Subject Headings, and the third edition of LC's Subjei^i Cataiosi/t£ 
MariuaL The persistence of these and other lonflStanding guides to subject analysis docs not 
i ndicate that the field is static. Rather, they provide a stable context within which much 
thoughtful and inventive work is taking place. 

What follows is a selective survey and bibiiography of the past year's library and mfomtation 
science iournal literature relating to that work in the field of subject analysis, ehm control 
and access. Also included in the bibliography are several surveys similar to this one that 
appeared in 1989: Lancaster ct al,, Taylor and Wollner. 



active strategies 



explore 




protocol segment 

He starts out with the following: The year 1989 saw the publication of new editions of a 
number of standard subject analysis tools, including the 20th edition of the Dewey Decimal 
Classification, the 12th edition of Library of Congress Subject Headings, and the third 
edition of LC"s Subject Cataloging Manual. The pereistence of these and other longstanding 
guides to subject analysis does not indicate that the field is static. Rather, they provide a 
stable context within which much thoughtful and inventive work is taking place. What 
follows is a selective survey and bibliography of the past yeafs library and information 
science journal literature relating to that work in the field of subject analysis, ebm control 
and access. Also included in the bi bibliography are several surveys similar to this one that 
appeared in 89 1989: Lancaster and others, Taylor, and Wollner. 
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[BIBLIOGRAPHY 
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Fig. 4.42. Working step Hearn-4: Reading the first two paragraphs of the article 
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Fig, 4.43. Working step Hearn-5: Exploring the outline 
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Similar steps with normal information intake are the step Judge-3 (Fig. 4.27) 
where it pays to inspect the document beginning, the Trueby sequence with 
steps that differ in outcome, and the step Goonatilake-66 (Fig. 4.52). 



Working step Hearn-5: Exploring the outline 



The step Heam-5 brings the bulk of material for the first sentence of the abstract. Inge 
reads the section headings (i.e., the outline of the article). When she writes the introduc- 
tory sentence of the abstract (see working step HeamS) she will list them to character- 
ize the scope of the source article. 

It is instructive to compare Inge’s work to that of Edward in step Mackin-7 
(Fig. 4.31). Information intake is analogous in both cases, but in step Mackin- 
7, exploring the outline serves informational checking. Another difference re- 
sults from the outline that is exploited. Whereas Inge is confronted with only 
one level of subdivision, Edward follows deeper subdivisions of headings in the 
Mackin paper. 

Exploration of section headings. Inge explores the outline, noting the section 
headings (strategy ex-form). She has restricted her interest to a particular level 
of text organization (strategy level-in), looking only at the section headings 
(strategy heading - restrict perception to headings). She can accept the head- 
ings as units as they are given by her printed input (strategy unit). 

In the given text layout, headings can be recognized because of their 
markup, enabling the use of the strategy skim. 

Skimming for headings is realized by the strategy read-on-demand (read in- 
termittently, on demand) and the basic read-form strategy (compare Fig. 4.43). 
As Inge is concise in her perception of the headings, the idea is that she really 
jumps the text between headings and produces a discontinuous stop-and-go 
reading. In contrast, Edward is assumed to fly through the whole paper in step 
Mackin-7, perceiving all material, although normal text is seen only in a 
dummy style (by the strategy read-over). 

Relevance assessment and encoding. For assessing the relevance of section 
headings, there is one sweeping argument: as parts of the document scheme 
they are relevant (strategy relevant-scheme). The newly explored outline items 
are attached to the document scheme representation. 
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Working step Hearn-6: Planning to produce an abstract 
of a review article 



The step Heam-6 is devoted to process planning. Inge gets ready to write her abstract. 

The large number of individual works listed in a review article cannot be in- 
cluded in an abstract for lack of space. As the summarizer says, the most she 
can do is indicate main trends, which tend to be found in the appropriate sec- 
tions - introduction and summary. 

Inge develops her plan for abstract production (strategy working-plan). Con- 
straints are set by the document type review article. The summarizer states 
how she will deal with them (strategy special-problems), as well as with the 
restrictions imposed on her work by the brevity of the abstract (strategy limits). 
Inge plans what she will do (strategy plan). During planning, she explains her 
decisions (strategy explain). The main planning issue is where to obtain the in- 
formation from (strategy source -of -information): in particular from the introduc- 
tion and the summary section of the article. Inge decides to save the explora- 
tion of the summary section for later (strategy later). 



Working step Hearn-7: Starting to write the abstract 



Inge begins her abstract of the review article. After starting the writing activity explic- 
itly, she develops the plan of her introductory sentence. She decides to begin her sum- 
mary with a characterization of the review, followed by its section headings. 

For readers who want to study how first sentences of summaries are produced it 
is instructive to compare Inge’s activity in this and the following working step 
with that of Edward Cremmins in step Mackin-13 (Fig. 4.33) and of Marliese 
Gunther in step Trueby-2 (Fig. 4.36). 

Beginning to write. As the thinking-aloud protocol shows, Inge begins the new 
working step (Fig. 4.45) with a clear start signal: “Now so we start. This, with 
the abstracting.” The strategies start-abstract (begin to write the abstract) and 
construct (construct a text passage of the abstract) are derived from this. In 
step Mackin-13 (“And I’m starting to write ...”), the same mental process is 
observed. It gives rise to the same start-up strategies. 

Returning to the source paper. When trying to write, summarizers notice 
their memory gaps. In the present case, Inge prefers to look up the number of 
references that she has noted in the margin of the paper. She is driven by the 
endeavor to state important characteristics of the document (strategy get- 
document-feature). 
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Fig. 4.44. Working step Heam-6: Planning to produce an abstract of a review article 
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Fig. 4.45. Working step Hearn-7: Starting to write an abstract 
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Since she knows the page setup and the place where she has jotted down the 
number, she can use her knowledge and guide the eye straight to the lower 
right side of the page (strategy retrieve - find meaning units selectively, using 
the information structure of the document). Optical reading must not do more 
than access the interesting position and copy the desired item (strategies read- 
find and read-form). 

Planning the introductory sentence. Whereas in Mackin-13 and Trueby-2 
the abstract starts with a topic sentence whose content is taken from the theme 
representation, Inge draws the content of her introductory sentence from the 
outline (the scheme representation) and puts it down in a more formal header- 
and-list structure. The list header states important document characteristics and 
announces the list, the list reproduces the section headings of the review. 
Whereas the list itself is not added until step Hearn-8 (see below), content 
planning is all completed in the current step. Otherwise the correct announce- 
ment of the list would be impossible. 

The summarizer obeys a number of principles when she configures the initial 
sentence of the abstract. Given that each principle is advocated by a strategy, 
the general text planner construct -plan needs a considerable set of cooperating 
strategies. 

A first group of strategies choose document-related information: 

• The get-document-feature strategy urges the summarizer to state features of 
the document in the summary, such as the audience, an added CD-ROM, 
or the quality of presentation. 

• gist-of-document. By stating the outline of the paper, Inge complies with 
the principle of making the document outline transparent in the abstract. 

• fact-in. Inge follows the strategy fact-in which insists on filling the abstract 
with as many facts, data, names etc. as possible. Four items of fact knowl- 
edge are included in the plan of the first sentence. Inge indicates the paper 
type, the time of coverage, the number of references, and the outline. 

• quantity. According to the strategy quantity, precise numerical data are 
preferred. The summarizer states the year 1989 and the number of refer- 
ences precisely. 

A second group decide about a suitable semantic form which happens to con- 
sist of list and header: 

• list-of-items. Inge presents the paper’s outline as a list. A list is a legiti- 
mate way of information presentation proposed by the strategy list-of-items. 

• header. A good strategy for the presentation of lists is to introduce them by 
a header. Inge foresees such a header (strategy header - represent a list by 
its header or header sentence). 
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Last but not least a strategy draws on a good information source: 

• con-heading. The strategy con-heading recommends drawing abstract pas- 
sages from the headings of the source document. It shows how to realize 
the gist-of-document strategy. 

Formulation by ready-made components. Inge puts ready-made components 
together when she gives her list header a linguistic form. Her sentence matrix 

this w article based on x references from the year y is divided into z sections 

has certainly already served many times to incorporate the core data about a 
review paper. It is drawn from the stock of the pattern strategy. Included are 
three information items: “review”, “1989”, and “54”. They can be entered into 
the matrix sentence without any grammatical adaptation or rearrangement. 



Working step Hearn-8: Forming a list from chapter headings 



In working step HeamS Inge continues executing the text plan she formed in step 
Heam-7. She copies the list of headings from the source paper. Some planning and 
commenting apart, copying binds all cognitive activity during the current working step. 

Intelligent copying. Intelligent copying includes four activities: reading a pas- 
sage, planning its use in the target text, possible reformulation, and writing. 
Writing is simple in the present step, but the other subtasks of intelligent 
copying merit some attention. 

Lookii^ up the headings in the source paper. Whereas Inge had looked up 
an individual number (“54”) in the preceding working step, she now systemati- 
cally copies the section headings. She knows them in principle from working 
step 5, without having memorized their wording. That she in effect rereads 
them is confirmed by the unspecific noises and the leafing noise in the think- 
ing-aloud protocol. 

Her reading intention is almost the same as in step Hearn-5 where she ex- 
plored the headings for the first time. What has changed is the memory situa- 
tion: now that she has a memory trace of the document outline, she can con- 
tent herself with reading and understanding just enough to fill up memory gaps. 
Therefore the explore strategy is accompanied by the refresh strategy which re- 
stricts reading to what is needed for memory update. External reading can be 
done by the read-find and the read-form strategy. 
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Fig. 4.46. Working step Hearn-8: Forming a list from chapter headings 
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Planning the headings list The headings list is easy to plan. The list-of-items 
strategy watches over its form, imposing a list structure, and the con-heading 
strategy decides that the list items are taken from the headings of the original 
article. 

Besides the regular planning work, there is an attempt to follow a different 
kind of argument (“...maybe I could say here already...’’ - strategy alternative). 
But it is given up. 

Formulation. Most of the formulation is mere copying (strategy ready-made). 
The summarizer’s own contribution is limited to connecting the copied state- 
ments by suitable punctuation marks. A colon binds the list to its header 
(strategy connect - connect individual statements to compose texts). The re- 
organize strategy simply puts commas in between the list items. 



4.3.6.S The Black sequence: Professional document use 
and incremental construction of a macrostatement 

Summarizer: Ingetraud Dahlberg, INDEKS, Frankfurt, Germany 

The Black sequence shows 

• the incremental (i.e., step-by-step) formulation of a macrostatement which 
spans whole sections of the summarized article 

• professional techniques of document use 

In the observed sequence, Ingetraud Dahlberg is exploring an article by Wil- 
liam Black about knowledge-based abstracting. We meet her during working 
step 16. Here she begins to state a macrosentence which spans over sections 2 
and 3 of the article. By this time, she has already inspected the short resumd of 
the author that accompanies the article. She has looked at the references and 
the section headings. Then she has begun to explore the text of the article, 
starting with a complete reading of the introduction and lowering her reading 
intensity while inspecting the second section. 

Incremental development of a macrostatement. A macrostatement that 
summarizes two sections of a paper presupposes a long and swift exploration 
move that covers the sections at hand. Such a move occurs in the Black 
sequence. To derive the phrases of her macrostatement, Inge generalizes from 
her exploration results, often finding a label for them from ideas of her own. 
She reworks the knowledge that she has gathered from the original before she 
integrates it into the macrosentence, which will eventually be encoded. The 
macrostatement is attached to the document scheme as soon as it is com- 
pleted, i.e., in working step 19. 
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Professional document use. The dynamic reading techniques demonstrated in 
the Black sequence back up the evidence of the Goonatilake sequence below. 
Both case studies together better show readers that advanced strategies of 
document exploitation are indeed important in summarization competence and 
practice. 

Inge presents the following exploration techniques: 

• reading beginnings of paragraphs only (sampling) (Black-16) 

• complete reading (Black- 17) 

• exploration for cited papers (Black-18) 

• integrated reading of headings, a definition, and cited papers (Black-19) 



Working step Black- 16: Sampling through a document 
by beginnings of paragraphs 



Inge demonstrates a common sampling technique of summarizers. By reading the begin- 
nings of paragraphs she counts on hitting upon the topic sentences placed there by pru- 
dent authors and thus perceives, in fact, an in-text summary by jumping from one para- 
graph beginning to the next. Inge starts integrating the concepts from the paragraph 
beginnings in a summarizing sentence of her own. 

Of interest for a more detailed discussion are both the exploration technique 
and the production of the macrostatement. 

Document exploration. Sampling through a document implies an operational 
decision to read in a special mode and a technique of discontinuous reading. 

The decision to sample is put forward by a group of four strategies (compare 
Fig. 4.47): 

• sample decides that only text probes are looked at. The current probing 
principle is to accept beginnings of units (defined by unit and first). 

• unit insists on having the input text piecewise (by paragraphs) 

• first concentrates the attention on the beginnings of units 

• by-form keeps up the orientation with respect to the outline. It is needed 
since Inge proceeds section by section. 

The stop-and-go reading is managed here by read-on-demand in cooperation 
with the standard reading strategy read-form. 

Beginning the macrostatement. Inge’s macrostatement is not a plain copy of 
concepts taken from the text. Rather, Inge states her observations on the 
author’s intellectual moves (“Now he goes on ...”). She elaborates on the docu- 
ment’s content, i.e., she adds her own ideas to its meaning material (strategy 
elaborate), putting a label on the section that states its meaning role (strategy 
label). It functions as the first part of a macrosentence (strategy macrosentence 
- state a global meaning structure of the document). 
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Fig. 4.47. Black-16: Sampling through a document by beginnings of paragraphs 
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Fig. 4.48. Working step Black-17: Plain reading instead of text sampling 
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The macrosentence under construction elaborates the outline. For this reason it 
is relevant according to the argumentation of relevant-scheme. It is encoded 
step by step as it is produced (strategy hold influenced by hold-increment 
which enables processing by several steps). The linkage to the document 
scheme is postponed until the whole macrostatement is complete (in step 19 - 
Fig. 4.50). 



Workii^ step Black-17: Plain reading instead of text sampling 



With working step 17, Inge changes to plain reading. She reads the next paragraph 
completely, stimulated by the hint on the part of the author that he is now returning to 
his theme: “Returning to the goal of producing abstracts... ” 

Plain readily. The strategy texthint reacts to the author announcing his return 
to the theme. It switches the reading attitude to plain reading. Plain reading is 
defined by the basic perception method browse (normal reading and under- 
standing), combined with unit for delimiting input packages. In the thinking- 
aloud protocol Inge explicitly states her position in the text (compare Fig. 4.48: 
“Now here in his third paragraph...”). She is still guided by the outline and the 
strategy by-form. The reading strategy read-form is sufficient for information 
intake. 

Forming the next unit of the macrostatement. Inge summarizes the para- 
graph by describing its meaning role in formal terms: “So he starts his argu- 
mentation by this” (strategy label - state the meaning role of a text passage). 
Again the summarizer elaborates the text information (strategy elaborate) by 
assigning a label. The result is the next part of the macrostatement under way 
(strategy macrosentence). 

The strategies hold, hold-increment, relevant-scheme, and explore continue 
functioning as in the previous step. 



Working step Black-18: Looking for the sources 

Working step Black- 18 again brings a switch in the perception method of the sum- 
marizer. The following paragraph quotes a series of papers. They are cited as is often the 
case by names of authors and publication year. Inge concentrates her attention on them 
and disregards all other features of the passage. 



Concentratii^ on the sources. Cited authors are important for summarizers 
who have an active grasp of their field. Citations are the conventional way to 
establish a context (or intertextual relationships) between scientific documents. 
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A summarizer who is well aware of the research work in the whole discipline 
also knows the cited papers and can appreciate the current paper in relation to 
them, for instance stating the progress the paper at hand brings about. Context- 
sensitive document exploration may be an outstanding performance in routine 
summarization, because it demands a considerable scientific horizon and 
steadfastness. However, it corresponds to standard pragmatic strategies of 
scientists reading papers from their domain. 

The context-oriented information intention is defined by three strategies of 
document use (compare Fig. 4.49): 

• intertext (put a paper into the context of related documents) is responsible 
for the intention to explore the relationships of the current paper with other 
documents in the domain. 

• sources (find out the sources used for a paper) decides to identify the 
sources of the current paper, following the citations in the document text 
and the references. In scientific papers, the sources are usually encoded by 
names of authors and publication years, or any suitable abbreviation. Their 
form is prescribed by the authors’ guideline for the journal and is therefore 
regular. 

• select (perceive only text passages that correspond to your current parame- 
ter or pattern) specifies which structures in the text represent the desired 
information so that visual perception can concentrate on the interesting 
units. Anything else is relegated to the background. 

Reading very special targets. Although cited papers are a well-defined intel- 
lectual search pattern, they are a rare reading target. Like other search patterns 
which are configurated by select, they are passed on to the “programmable” 
reading strategy read-param (read according to an individually specified 
parameter or pattern). It accepts a client-defined pattern and delivers tailor- 
made results, here cited authors. It is combined with read-form for basic input. 

Adding the next unit to the macrostatement. The summarizing label “after 
having referred to a number of authors” is arrived at through simple abstraction 
from the occurring cited papers (strategies label and generalization) and added 
on to the half-finished macrosentence. It now reads 

Now he goes on to consider sentence structure and text structure; so he starts his 
argumentation by this and after having referred to a number of authors ... 

Whereas in the earlier steps, the relevant-scheme strategy alone determined 
relevance, it is now complemented by the strategy relevant-cited that insists on 
the relevance of cited papers or authors. 

All strategies that are not discussed work as before. 
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Fig. 4.49. Working step Black-18: Looking for the sources 
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Working step Black-19: Combined exploration mode and 
termination of the macrosentence 



Inge shows a sophisticated combined exploration mode aiming at headings, possibly 
with attached information, and cited papers. She finishes the macrosentence and links it 
to the document structure knowledge. 

Combined exploration mode: headings and cited papers. From the think- 
ing-aloud protocol we learn that Inge has adapted her perception behavior 
again. She continues to refer to literature references but she also notes head- 
ings and subheadings which are allowed to have optional extensions. 

A big team of strategies combine to define her reading intention (compare 
Fig. 4.50): 

• intertext (elaborate intertextual meaning relations). Referencing earlier 
papers is a way to establish connections between the paper at hand and 
other work. Following these relations helps a summarizer to appreciate 
above all what is new and the real contribution in the current paper. 

• sources (consider a document with respect to the sources (documents or 
authors) where it takes its knowledge from). Cited authors and papers are 
worth looking at because they make it possible to reconstruct the sources a 
paper builds upon, and because they set up some context for the interpreta- 
tion of the paper at hand. 

• by-form (keep to the document structure). The by-form strategy corresponds 
to the observation that the summarizer is aware of the document outline. 
Otherwise she would not be able to interpret headings. 

• heading (restrict perception to headings). The summarizer currently looks 
only for headings. 

• browse (normal reading for understanding). Normal reading for understand- 
ing occurs when Inge reads the definition of abstracting that immediately 
follows the heading announcing the abstracting techniques. 

• first (search through the beginnings of text units). Reading for understand- 
ing is restricted to the beginning of the section about abstracting by the 
strategy first. 

• select (perceive only text passages that correspond to your current parame- 
ter). The select strategy is fed by the strategies which contribute to the 
definition of the reading intention. It constructs a combined data interpreta- 
tion pattern that tells the external reading strategies to access cited authors 
and headings with neighboring text. 
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Fig. 4.50. Working step Black-19: Combined exploration mode and termination of 
the macrosentence 
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The reading intentions are executed by two reading strategies: read-form re- 
cognizes items with their layout, and read-param deals with a special ad hoc 
pattern integrating headings, normal text and cited authors which has been de- 
fined by select. 

The information acquisition results in three literature references (Simmons 
85, Earl, and Bonzi and Liddy), two new headings (the “keyword method” and 
the “indicator phrase method”), and a definition of abstracting. 

Finishing the macrostatement. Inge notes the cited authors, the already 
known heading of the chapter (“abstracting techniques”), and the two subhead- 
ings (“keyword method” and “indicator phrase method”) in an integrated form. 
She elaborates additional meaning relations (strategy elaborate), above all the 
difference between keyword method and indicator phrase method. She recon- 
structs the approach of the author (“he goes ... he refers”). She makes a mental 
note of this relation, as becomes clear from its later use in the abstract. The 
author’s definition of abstracting (“abstracting is to extract from a text the most 
representative sentences”) disappears. 

Before the exploration results are integrated in the macrosentence (through 
the strategy macrosentence), their relevance must be judged, relevant-cited 
pleads for the relevance of the cited papers, relevant-heading, relevant-scheme, 
and relevant-formhint insist with different arguments on the importance of the 
newly acquired headings. Evidently no strategy advocates the relevance of the 
definition. The relevance judgement in favor of the whole macrosentence is 
provided by the strategy relevant-scheme, which recognizes information 
elements as relevant and worth retaining because they elaborate the document 
scheme. 

The macrosentence elements are entered in the document scheme by the 
hold strategy. With the completion of the macrosentence in working step 
Black-19, the activity of the strategy macro sentence is terminated. The goal of 
the strategy hold-increment has been achieved as well. In the document 
scheme representation, we find the representation of the macrosentence. 



4.3.6.6 The Goonatilake sequence: Dynamic reading techniques 
Summarizer: Edward Cremmins, Rockville, MD 

The Goonatilake sequence demonstrates the strategic reading techniques which 
are almost indispensible to a summarizer. Dynamic reading techniques, tech- 
niques of information use or document exploration techniques are strategic 
behaviors that apply “normal” or basic reading in order to reach a goal, for 
instance learning, summarizing or decision making. While we have no gener- 
ally accepted name for these intellectual techniques there is no doubt about 
their use. 
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While the Goonatilake sequence presents document exploration techniques 
most extensively, it is not the only one in this collection to do so. Readers are 
invited to compare the techniques used here by Edward Cremmins to those ap- 
plied by Ingetraud Dahlberg in the Black sequence. 

As a secondary point of interest, the sequence allows us to observe incre- 
mental relevance assessment. In the first paragraphs of Chapter 7, Edward has 
hit on a summary of the whole book (step 66). During his rereading of the pas- 
sage (step 69), he convinces himself that he has indeed stumbled on such a 
piece of luck for a summarizer. 

Overview of the Goonatilake sequence. The complete summarizing process 
dedicated to Susan tha Goonatilake’ s book Aborted discovery: Science and crea- 
tivity in the Third World comprises 140 working steps. Of these, seven steps are 
presented here, namely steps 65, 66, 69, 91, 93, 95, and 96. During these steps, 
the summarizer explores the end of the book, namely Chapter 7 Future 
pathways, degrees of leeway and its last section In lieu of a conclusion. The 
large interruption between working step 69 and step 91 is due to an error repair 
move which is not presented here: Edward remembers that he skipped the 
preface. He returns to the beginning of the book, and after making up for his 
omission, he comes back to Chapter 7 in working step 91. At this moment, we 
meet him again. Some individual steps, such as step 94, have been left out for 
lack of interest. 

Reading intensity varies greatly. The first passage of Chapter 7 is treated in 
three steps (66, 69, and 91). Edward will return to it during steps 99, 101, 102, 
and 120-124. He will use it in his abstract. In contrast, Edward glances through 
the whole body text of the chapter in only one working step. This corresponds 
to a very economic and shallow information acquisition. In the middle of the 
last section, Edward takes more probes from his source text, looking now at 
paragraph beginnings and achieving a medium depth of penetration. 

The chosen steps present the following techniques of dynamic reading and 
task-oriented information use: 

• Goonatilake-65. During the systematic chapter-by-chapter-exploration of a 
book, Edward reads the chapter heading and the number of the first page of 
Chapter 7. 

• Goonatilake-66. Edward pays attention to the beginning of the chapter, 
reading the first paragraph completely. 

• Goonatilake-69. While staring at the beginning of the chapter, Edward 
identifies it as an in-text summary and decides about its relevance. 

• Goonatilake-9L Back at the beginning of Chapter 7, Edward rereads the 
first paragraph asking for “the purpose of the book” in the topic sentence of 
the abstract. 

• Goonatilake-93. A summarizer at shallow and high-speed exploration, in- 
specting 26 pages at one go. 
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• Goonatilake-95. Medium-speed exploration with medium sampling den- 
sity. Reading beginnings of paragraphs, Ed is expecting to hit on topic 
sentences. 

• Goonatilake-96. Edward reads the first paragraph of the final section In lieu 
of a conclusion, without a result. 

The comments focus on document exploration techniques. Other activities are 
explained with less emphasis. In the graphical display of the working steps, the 
document exploration strategies are put in a dotted box. 



Working step Goonatilake--65: Chapter 7 - heading and page number 



We meet Edward during his systematic exploration of the hook, precisely at the moment 
when he turns his attention to Chapter 7 ‘'Future Pathways: Degrees of Leeway” and 
notes it in his logbook as known. 

From his long-range behavior we see that Ed works his way systematically 
through the outline which he knows from earlier exploration steps. In the cur- 
rent step, Edward is after “technicaf’ information, identifying the next chapter 
by its heading and location in the book, namely on page 142. Whereas we must 
assume that in exploring the book chapters, Edward follows his large-scale 
general working plan, he now follows a small-scale plan. He wants to know the 
first page of the chapter for practical reasons of orientation and bookkeeping. 
Here he is using the location strategy that urges summarizers to explicitly state 
the position of an information unit. Edward notes the page number in his 
logbook. 

Reading a heading and a page number. Reading both chapter heading and 
page number is special since the two information items are expected at differ- 
ent locations: the heading at the beginning of the chapter inside the type area, 
the page number in the margin of the printed page, for instance at the outer 
lower edge. Since Ed is already at Chapter 7, he is familiar with the page setup 
of the book. 

Exploring the heading and the page number requires two reading acts 
(compare Fig. 4.51). They follow different principles: 

• The heading is most probably read with a normal sequential access to the 
main text. It is recognized by its layout features. 

• No sequential reading is needed for accessing the pagination. A skilled 
summarizer is aware of the page setup and knows the position of the page 
number. Reading jumps directly to it. 

A group of six strategies cooperate to stipulate the aims of the two necessary 
acquisition acts. Three of them state general conditions of the acquisition acts: 
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• by-form knows the document organization. At the moment of step 65, Ed- 
ward already has a good overview of the document outline. From earlier 
explorations he knows the titles of the chapters. Currently, by-form points at 
Chapter 7. 

• unit insists on accepting only one information package at a time from the 
source document. In the current step, this is the title and the page number. 

• first attracts above-average attention to the beginning of units, here to the 
beginning of Chapter 7. It is motivated by the experience that authors tend 
to produce important statements right at the beginning of their document. 

Another group of three strategies defines the search aims and methods: 

• The strategy heading describes what Edward currently wants to see, 
namely a chapter heading. 

• The search strategy searches a text sequentially. Edward simply goes for- 
ward in the text until he encounters the interesting chapter heading. 

• retrieve governs document structure knowledge for retrieval support. The 
page number can be accessed directly by the structure-driven retrieve 
strategy since its regular position and form is known. 

Three strategies are entrusted with optical reading: 

• read-find looks for a well-defined target. It stops as soon as it has located 
the search goal. 

• read-free interprets a page in a direct access style. It jumps straight to tar- 
get, in the present case to the page number. 

• read-form goes through the text sequentially and copies the target passage 
into the document surface representation. 

The heading itself is already known and therefore of no particular interest, 
whereas the page number is useful for the summarizer’s further navigation in 
the document. 

The logbook entry. The page number requested by the strategy location is re- 
corded in the summarizer’s logbook (strategy login for the logical entry in the 
logbook area, and strategy write for producing the external note). 

While most summarizers take notes, the use of a logbook that reports the 
state of his processing is Edward’s personal style. A logbook is useful because 
often a summarizer interrupts a subprocess by will or by necessity. So Edward 
must bridge 20 working steps from the moment when he discovers an in-text 
summary at the beginning of Chapter 7 (working step 69) until he comes back 
to further analyze it (step 91). 
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Fig, 4.51. Working step Goonatilake-65: Chapter 7 - heading and page number 
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Workii^ step Goonatilake~66: Reading the Hrst paragraph 
and starting relevance assessment 

Edward executes an act of common full text reading. 

In working step 66, Ed reads the first paragraph of Chapter 7. Analogous 
“normaf’ information acquisition acts are found in steps Trueby-2, Trueby-3, 
Trueby^, Trueby-5 (Figs. 4.36 - 4.39), Hearn-4 (Fig. 4.41), and Black-17 
(Fig. 4.48). Ed goes on reading the next two paragraphs in working steps 67 and 
68. Since they bring no new insight, the latter two working steps are not 
discussed. 

The second interesting feature of the working step is the start of an incre- 
mental relevance assessment move. Readers may compare it to the lengthy ac- 
quisition of a macrostatement in the Black sequence. 

Reading the first paragraph. In comparison to the preceding working step, 
Edward’s information acquisition intention has changed. His current behavior of 
normal reading is characterized by a familiar strategy configuration: 

• by-form keeping Edward’s orientation in the outline. He is still aware of 
being at the beginning of Chapter 7. 

• unit restricting input to the next text unit, here a paragraph 

• first still focusing attention on the beginning of the Chapter 7 

• browse as the strategy for full text reading and interpretation 

One physical reading strategy - read-form - suffices to copy the desired text 
passage to the document surface scheme. 

Beginning to decide about relevance. The newly acquired text passage pre- 
sents quite striking relevance indicators: 

• It is found at the immediate beginning of a chapter, i.e., of a major text 
component and is thus possibly relevant according to the strategy relevant- 
unit. 

• It contains author’s indicator phrases for topic sentences ‘T have described 
in the previous chapters...”, “I have contrasted...”, “I have demonstrated...” 
and is thus relevant in the eyes of the strategy relevant-texthint. 

• It links with the theme, which deals with “science and creativity in the 
Third World” by adding new information both to “science in the Third 
World” and its creativity. Thus we find an elaboration of the theme. The 
strategy relevant-call must find it relevant. 
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Fig. 4.52. Working step Goonatilake-66: Reading the first paragraph and starting 
relevance assessment 
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In spite of these hints, Edward does not yet react. He postpones the relevance 
decision until step 69 (see below). Indeed it is normal to hesitate a bit when 
convincing oneself of the importance of a finding. The strategy hold-increment 
mirrors this behavior. It allows a relevance assessment act to extend over sev- 
eral working steps. 



Working step Goonatilake-69: Deciding about relevance 
and reading for memory support 



In working step Goonatilake-69, most of the intellectual energy is absorbed by think- 
ing, first about relevance assessment and later about planning. Reading supports the 
memory representation of the text passage under consideration. 

Writing (in particular copying) is an obvious situation where rereading for re- 
freshment of memory traces becomes necessary (compare Figs. 4.33 and 4.46). 
The same is true for reinterpretation (see step Goonatilake-91 - Fig. 4.54 be- 
low), and for thinking about relevance. 

Reading for memory support during relevance assessment. Rereading as 
observed in the current working step (see Fig. 4.53) means that the summarizer 
is “staring af ’ a text passage that he knows, i.e., he already has a more or less 
complete mental representation of it. From the external repesentation he 
recovers what is missing in memory. Since the inner and the external 
representation mirror each other, it is hard to decide what has been taken from 
memory and what has been read anew. 

We observe reading for memory support as a help function during relevance 
assessment. For this reason, hold, the control strategy for relevance assess- 
ment, is at the root of the strategy network. While Ed slowly wins confidence 
in the utility of the passage under consideration, the hold-increment strategy 
remains at work. 

The control strategy for document exploration, explore, is put into a back-up 
mode by the adjoining strategy refresh. It restricts reading to complementing 
the mental representation by recovering details from the external representation 
if needed. 

The reading intention is simply to keep an eye on the pertinent passage 
(strategy overview). It causes the visual system to switch to the “staring-at” or 
“stand-by” state with no systematic information acquisition. The reading strat- 
egy read-over executes the instruction and yields an underspecified shallow 
image of document passages, something like a pixel graphic. Strategy read- 
form is used in the case of real information intake. 
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Fig. 4.53. Working step Goonatilake-69: Deciding about relevance and reading for 
memory support 
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Completing incremental relevance assessment. In the fourth step dedicated 
to the passage, Ed is definitively convinced that he has found a highly relevant 
piece of text. Now he draws his conclusions: “Here we have the summing up of 
the whole book.” he says and ratifies his relevance decision. 

A big cast of five relevance strategies are mustered to judge the importance 
of the paragraph under discussion, relevant-summary argues most comprehen- 
sively, integrating the judgements of the other four. Edward can recognize the 
summary because of 

• the author’s indicator phrases (interpreted by relev ant-texthint) 

• the references to the chapter organization of the book (delivered by rele- 
vant-scheme) 

• the elaboration relations to the pre-existing theme representation (checked 
by relevant-call) 

• the positive vote of the above-mentioned three strategies for a passage 
that, in addition, consists of more than one statement (established by rele- 
vant-summary) 

The strategies relevant and label make sure that the relevance of the passage 
is explicitly stated. Edward qualifies the passage as “the summing up of the 
whole book...”. 

The relevant statements are now entered into the document scheme and 
theme representation (see Fig. 4.53). 

Some error recovery planning. Edward turns to some error recovery plan- 
ning, remarking that he has failed to look at the preface (strategy omission). He 
states what he intends to do next (strategy plan), adapts the working-plan 
(strategy working-plan) to fill the omitted task in, and then plans what to do 
after treating the preface (strategy look-ahead), namely return to his interrupted 
work at page 142. This is noted in the logbook and written down as an external 
note (strategies login and write). 



Working step Goonatilake-91: Task-oriented reinterpretation 
of the Hrst paragraph 

Edward comes back to Chapter 7 and reads its beginning for a third (and not for the last) 
time, looking now for the purpose of the book. 

In contrast with the “staring-at” style rereading in working step 69, the current 
step (compare Fig. 4.54) is characterized by an attentive reinterpretation from 
a task-oriented view: Edwaird looks for a statement about the purpose of the 
book, which is a standard component of a topic sentence. He must already 
have some rudimentary text plan for his abstract in mind. Finding material for 
the purpose statement is his current endeavor. As this is quite different from 
backing up memory during relevance assessment (in step 69), different strate- 
gies are called upon for setting exploration intentions. 
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Fig. 4.54. Working step Goonatilake-91: Task-oriented reinterpretation of the first 
paragraph 
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Besides the main aim of reinterpreting the summary, the exploration intentions 
must also manage the jump back to page 142 by which Edward returns to the 
main track of activities. 

Planning and reaching the place of action. At the beginning of working step 
91, Edward returns from the beginning of Chapter 1 to the beginning of Chapter 
7 on page 142. First, we hear him plan. '‘I think Fm going to go back...” He 
states explicitly what he is going to do next (strategy plan) and changes the 
working plan (strategy working-plan) not only by entering a jump back to 
Chapter 7, but also by redefining his goals. At the target position, he wants to 
find “what’s there on the purpose of the book”. Immediately afterwards, the 
new plan is executed. 

To find page 142, we can assume an informed search, specified by the re- 
trieve strategy and using the page setup. Visual reading is executed by the 
read-find and read-form strategies; read-free accounts for free jumps. 

Minimal planning of the topic sentence. Some abstract planning must have 
taken place. Evidently Edward anticipates an abstract starting with a topic sen- 
tence (following the strategy topic-first - begin the abstract with a topic sen- 
tence) which mentions the purpose or scope of the document. We must assume 
that within the scope of tide text construction strategy construct, the strategies 
construct-plan (build a text plan without considering expression) and topic, 
which introduces the patterns and content items of topic sentences, have pre- 
pared the content structure of a topic sentence. 

For this topic sentence, Ed is now seeking a purpose description from the 
source document. 

Reinterpretation of the book summary. Edward’s current in formation -seek- 
ing goal is to find in the source document an item that fills the purpose or 
scope position of the topic sentence (strategy get-filler - find content for a 
planned meaning component). 

As soon as Edward has arrived at page 142, he reads full-text, looking for a 
statement about the author’s intention or purpose of the book. This attitude is 
specified by the two strategies browse (normal reading for understanding) and 
search (sequential searching). Since Edward is rereading known material and 
shows no specific reaction, nothing new is entered into his document scheme 
and theme representations. He says “ahhdadadada”, but we learn nothing about 
his reinterpretation. 

For optical reading read-form (normal reading with character and layout re- 
cognition) suffices. 
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Working step GoonatiIake-93: Reading 26 pages at one go 



In working step 93, a dynamic and task-oriented reader quickly deals with a long text 
passage. Instead of the body text, Edward perceives headings only. 

In contrast to the intensive interpretation of the in-text summary, Edward now 
lessens the attention that he devotes to his input. The metacognitive strategy 
economy governs Edward’s shallow exploration style focusing on headings. 
Headings can be identified quickly and give a shorter characterization of the 
content than the body text. 

Economic information acquisition - from heading to heading. The inten- 
tion to explore a chapter only at the headings level is realized by the following 
strategies: 

• Edward restricts reading to headings (strategy heading). As headings an- 
nounce the content of the text body, he can thus expect to glean central 
concepts. 

• When stepping through the headings, Edward is all the time aware of the 
organization of the book. This is reflected by the strategy by-form that 
brings in document structure knowledge. 

• Perception units are accepted as found in the source document (strategy 
unit). In the present working step the accepted units are headings. 

• No complete perception of the chapter is intended - on the contrary. The 
acquisition mode can be best characterized as “hopping from heading to 
heading”. This behavior can be compared to a systematic sampling tech- 
nique, here achieved by picking one heading after another with the strat- 
egy sample (explore a text passage by sampling). 

• Reading headings can be redefined in layout features as reading high- 
lighted elements (strategy skim). This visual specification is easier to han- 
dle for external reading strategies. 

Optical reading needs two strategies: 

• To implement the reading procedure hopping from heading to heading the 
read-on-demand strategy for intermittent reading is used. It turns the read- 
ing function off as soon as it has reached a target, and waits for a new 
starting instruction. 

• real reading, i.e., copying the headings from the external source to mem- 
ory, is done by the basic strategy read-form. 
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The Current Context of Science 
in the West 
Loss of Certainty 
Loss of Certainty in the Sciences 
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...[Third World ... 
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document scheme 




' Conclusion 

■■ The Current Context of Science in the West 
" Loss of Certainty 
- Loss of Certainty in the Sciences 

” Foraging and Legitimizing 
■’ At 3 Less Abstract I^evel 
“ In Lieu of a Conclusion 






Fig. 4.55. Working step Goonatilake-93: Reading 26 pages at one go 
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Assessing relevance and encoding in the scheme representation. Edward 
stores at least the last heading In Lieu of a Conclusion. He falls back on it in 
the next working step. A probable hypothesis is that he expands his knowledge 
of the outline and links all the headings to the document scheme representa- 
tion. This interpretation has been used in the exhibit. The relevance strategies 
argue there with respect to the scheme representation: 

• What is highlighted by layout features is possibly relevant (strategy rele- 
vant-formhint). 

• Headings are relevant (strategy relevant-heading). 

• What can be attached to the outline is relevant (strategy relevant-scheme). 

Readers who want to compare the present working step to a similar one are re- 
ferred to step Hearn-5 (Fig. 4.43). 



Working step Goonatilake-95: Reading topic sentences 



Edward demonstrates a frequent sampling technique of medium intensity, namely the 
perception of first sentences of paragraphs. 

Edward assumes that well-constructed paragraphs start with a topic sentence. 
His goal is to hit them by reading the initial sentences of paragraphs. A similar 
but not identical exploration technique can be seen in step Sperl-18 (Fig. 4.63 
below). 

In step 94 (not presented here), Edward has read the beginning of the last 
section M of a Conclusion completely. The heading promises a summary, 
but the section comprises almost two pages and is too long to be read com- 
pletely. Since he is now dealing with the middle part of the section, he lowers 
his input rate to one sentence per paragraph, to the first and presumably most 
informative one (compare Fig. 4.56). 

Text sampling using topic sentences. The intentions for information intake 
are set by strategies relying on text features and other strategies that look at 
practical input management: 

• topic-sentence states the central perception intention of the summarizer. 
Edward wants to obtain the topic sentences at paragraph beginnings in or- 
der to have a short representation of the content. 

• As reflected by the strategy by-form, Edward is aware of the document or- 
ganization. In particular he knows that he is dealing with the concluding 
last section of the book. 

• He is just as aware of his position in a conclusion, i.e., in the upper level 
of information organization of the book (strategy top-level). Edward knows 
that conclusions often summarize the document, and pays special attention 
to the conclusion at hand. 
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Fig. 4.56. Working step Goonatilake-95: Reading topic sentences 
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I document surface 

[Science is ... from the 
sciemisfs view! 



document scheme 

■ Future Pathways: Degrees 

of Freedom 
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document theme 
Aborted discovery: 
science and creativity 
in the Third World 



external input 

The world of the intellect is one of ima&inalion, uncertainty and playfulness. This book is 
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active strategies 



top-level by -form 

1 1 

browse last unit 



protocol segment 

The world of the intellect is one of imagination, uncertainty and playfulness. This book is 
about the entry into such a world, into such a world, the world of creativity in science. It 
expresses the belief that Third World scientists should engage in concept uaMzing, playfully 
ju^ling with ideas and trying to caich their own glimpses of physical reality behind the 
veils that hide it There may be no ultimate physical truths, although temporary thrills may 
be obtained by imagining that one's latest discovery is the ultimate troth. The exhilaration 
is in the process of discovery and in discourse with nature, it is in the deep disappointments, 
the Long silences and the occasional words. It is also in relating to nature to tame her so that 
she can deliver bounty to man. History demands that we in the Third World should now 
re-enter into our discourse with nature, a discourse sometimes punctuated with silence, but 
also with knowing smiles and laughter. In lieu of a conclusion, hey. 




i 



document surface 

[Science is ... from the scientist’s view] 



The world ...smiles and laughter 



Fig. 4.57. Working step Goonatilake-96: “In lieu of a conclusion, hey.’ 


















238 4 Professional Summarizing 



• sample (explore a text passage by sampling) defines the exploration tech- 
nique. In the present case, it accepts first sentences of paragraphs. 

• unit partitions input into manageable units proposed by the document. 
Here, paragraphs are the input units. 

• first keeps attention focused on the beginnings of units. 

The external reading strategies work in a team of two: 

• read-form copies text with layout features. 

• read-param deals with non-standard reading demands, such as the demand 
to discover topic sentences. It takes here the individually tailored input 
specification identifying first sentences of paragraphs (see step Black-19 - 
Fig. 4.47 for another use of read-param). 



Working step Goonatilake-96: ‘‘In lieu of a conclusion, hey’^ 



Edward explores the end of the book. He reads it completely and determines the role of 
the passage: ''In lieu of a conclusion, hey. '' Then he goes on. 

We observe a working step dedicated to normal information acquisition. Most 
of the following intentional strategies will be familiar to the reader: 

• by-form keeping Edward’s orientation on the outline. The summarizer 
knows that he is at the conclusion, a priviledged place in the document 
structure. 

• top-level insisting on the high information value of high-level components 
such as the conclusion and restricting information intake to them 

• unit limiting input to the next text paragraph 

• last concentrating the attention on the end of a unit, here of the whole 
book 

• browse as the strategy for sequential full text reading and interpretation 

At the level of operational reading the act is unproblematic. The strategy read- 
form copies everything desired to the document surface representation. 

Only later - in working step 135 - does Ed return to this passage and use it 
for his abstract. Relevance decisions will be taken there. 



4.3.6.T The Rada sequence of pragmatic indexing 
Summarizer: Harold Borko, Los A ngeles, CA 

The Rada sequence shows pragmatic indexing techniques. Harold indexes Roy 
Rada’s paper Maintaining Thesauri and Metathesauri, presenting well-known 
practices: 

• he draws indexing terms out of the paper’s title (step Rada-2) 
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• he checks whether a concept is well-defined (step Rada-3) 

• he exploits the optical impression of a page (step Rada-5) 

• to ease retrieval of collateral concepts, he adds a broader term (step 
Rada-12) 

Some of the practices, such as taking the indexing terms from the title, may at 
first glance seem extremely pragmatic. Copying the words of the title may be 
criticized as superficial and lazy, but checking the definition of a concept, or 
opening the view to image interpretation instead of normal character reading in 
no way offends good intellectual habits. On the contrary, it shows versatile 
strategies of document use which may not be at everybody’s disposal. 

The Rada sequence also demonstrates free-term indexing as opposed to in- 
dexing bound to a controlled vocabulary (as done in the Sperl sequence). The 
summarizer keeps to the rules for forming indexing terms, but the wording is 
his. It follows the use in the domain. In particular, the wording need neither be 
identical with the author’s nor given in a particular thesaurus. Forming a free 
indexing term is performed by the strategy get-index-term. 



Workii^ step Rada-2: Indexii^ by exploiting the title 



In working step Rada-2 Harold proceeds to indexing. At this moment the only knowl- 
edge he has about the paper is its title. That is sufficient security for him: *'Ehh okay, the 
title of this article is maintaining thesauri and metathesauri. So clearly, thesauri has to 
be an (...) index term'\ 



Indexing from the title. Exploiting the title (compare Fig. 4.58) is an indexing 
strategy as common as it is obvious (ind-title - index with concepts from the ti- 
tle). It is better founded than it is often thought to be. The summarizer knows 
which concepts are important in the field. If these important concepts figure in 
a title, so the argument behind the strategy, they are relevant to the field and 
at the same time to the article, since the author put them there. 

Rereading for memory enhancement. The summarizer reads the wording of 
the title again while deciding about his first index term. Current exploration 
strategies are at work: 

• explore under the influence of refresh realizes rereading for memory 
backup. 

• browse suffices as modality of information intake, no specific restrictions 
are needed. 

• unit partitions input as usual according to the source document organiza- 
tion. 

• by-form keeps up orientation in the document structure. 
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Fig. 4.58. Working step Rada-2: Indexing by exploiting the title 
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Fig. 4.59. Working step Rada-3: “What is metathesauri?” - indexing concepts must 
be defined 
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Relevance assessment. Harold is certain of the relevance of concepts from the 
title (strategy relevant -title). No further arguments are necessary. He takes 
Thesauri, the first object term of the title, and writes it down (strategy write). 
Thesauri shows up in tfie theme representation as (partial) paraphrase or elabo- 
ration of the theme. The product area records it as written index term. 



Working step Rada-3: ‘‘What is metathesauri?’’ - indexing concepts 
must be defined 



The title offers ''metathesauri'' as the next candidate indexing term, a doubtful candidate 
in the indexer's eyes. Before accepting it, Harold checks it for an adequate definition. He 
searches the article to find out what "metathesaurus" means. 

Harold demands of an indexing term that it be well-defined and useful in the 
retrieval system, so that anybody looking up metathesauri in a concrete situa- 
tion can be certain of finding a relevant article on the subject. 

Checking that an indexing concept is defined. Harold has drawn his next po- 
tential indexing concept from the title (strategy ind-title - as in the preceding 
working step. Fig. 4.58). It reads “Metathesauri” and raises the suspicion of not 
being well defined. Therefore the indexer checks its definition (strategy ind- 
defined, see Fig. 4.59). 

Questioning the document (strategy question - question your input text) 
about a definition is a practical means of establishing whether a concept is suf- 
ficiently defined to serve as an index term. The strategy question helps ind-de- 
fined to find out about the definition of metathesaurus by triggering an appro- 
priate exploration act. 

Searching the article for a definition of “metathesaurus”. Harold searches 
for and finds a section on metathesauri (see Fig. 4.59, thinking-aloud protocol: 
“Yes, he has a section here on metathesauri”). Since he finds the section first 
and then starts reading it, he must be supported in his search by the outline 
(strategy by -form). As soon as he has reached the heading “3. Metathesauri” he 
anticipates that the text body will explain what is announced by the heading 
(strategy head-in-text - expect to find in the text what is announced by the 
heading). Indeed, right at the beginning of the metathesauri section he finds a 
definition: “A metathesaums transcends a set of thesauri”. 

Putting it down. “Metathesauri” meets the definedness condition for an index 
term. There is no doubt about the relevance of the concept metathesaurus as it 
has been picked up from the title (strategy relevant-title). Harold confirms to 
himself that he has already covered the term “thesauri” (“that were covered” - 
strategy done). He repeats the definition in his own words (strategy own- 
words). He is satisfied and able to decide “So I think then that metathesauri 
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becomes an index term as well” and writes (strategy write). As output he pro- 
duces “Metathesauri”. In the document theme area, the notion is added to the 
indexing-oriented representation of the theme. 



Workit^ step Rada-5: Visual processii^ on a single word basis 



Harold goes beyond the standards of normal reading in favor of image interpretation. He 
gains an index term by looking at a page with a pattern recognition view, by noticing 
the frequency of the item “index" which is distributed all over the page, and by establish- 
ing the relevance of the concept “indexing ”. 

The plausible assumption is that in working step 5, Harold takes a look at the 
first page of the article (compare Fig. 4.60). In the subsequent working steps 6 
and 7, which are not included here, we shall find him at MeSH Maintenance 
and Thesauri Maintenance, i.e., paragraph 2.2 on the first page of the article 
(see external input window of the exhibit). In the current step, indexing catches 
hi s eye since it is scattered over the page in 23 occurrences in the forms 
“indexing”, “index”, “indexer” (not all in the segment recorded in the exhibit). 
Harold declares laconically after a pause: “Okay, we have to use indexing.” 
From the ouput data we can see that he has written down the term “Indexing”. 

Interpreting a pattern of single words. Interpreting information in radically 
different fashions belongs to the advanced techniques of document use. What 
Harold does can be explained by the following set of strategies: 

• He is looking for every interpretation that could be of interest, for all the 
many patterns that may help to understand situations (strategy open). 

• In particular he does not attempt to understand a text, but simply to assimi- 
late isolated words (“to pick up some terms”, as he mentions later). This is 
represented by the strategy word (perceive single words) which precludes 
a normal understanding. 

• He discovers a somehow eye-catching and interpretable constellation of 
items, which can be distributed over the page in any pattern. We can 
describe it as a starred-sky pattern where the occurences of indexing figure 
as stars while everything else is background. Developing a special input 
pattern is allocated to the strategy select (perceive only text passages that 
correspond to your current parameter or pattern). 

Evidently, the eyes move freely when serving image interpretation as in the 
current working step. The strategy read-free goes over the journal page in any 
sequence as one scans a painting, read-find finds the positions of the items of 
interest, read-open is free to accept any repertoire of signs as requested by the 
interpretation processes, and read-form copies at least some occurrences of 
indexing into the document surface representation. 
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extern al input 
2. Thesiauri 



The a letiiL'val sysacni is lai^cly dependent 

upon the documcnl classes in iL These classes may be 
based on terms ['rom an index inj^ language (12). One of 
Itie most popular fomis lor an indexing language is a 
ihesduruN. A Ehesaums is a set uf eoncepis in which each 
concept is represented with at least synonymous terms, 
broader concepts, narrower concepts, and related concepts 2.2 MeSH Maintenance 



This thesaurus group consists of experts in the domain of the 
literature to be processed. These experts have a gra^^ of (he 
Ecrminology and semantic subtleties of the subject field. The 
thesaurus group is responsible for collecting index terms, and 
making the thesaurus as up-to-date as po.ssible for the iivdex 
group. The index group i ndexes the difcuments according to 
the latest vebdon of the thesaurus and suggests new index lemis 
to the thesaurus group. 



2,1 Prtneiptes 

In the development of a hihliugraphic ciiaiion systena, 
one dilemma is wheEher to huild a thesaurus first and 
then use it for indexing the documents or to index the 
documents by tree terms artd then construct the thesaurus 
after accumulating free terms. As is It often true in life, 
the middle ground is particularly attractive, and that 
means in this case building a thesaurus and indexing 
documents hand-in-hand. 

An institution which indexes many documents and 
maintains a large thesaurus may have special staff for 
thesaurus maintenance. 



At the National Library of Medicine, which indexes about 
.1i|>n,rKkt documents per yeai, about 100 people are devoted to 
index imL Atx^ul a dozen index terms arc assigned tii each 
document, but an indexer is expected to spend only 1 5 minutes 
on each dticumcnl.The indexing staff is complemented by a 
thesaurus suff of about 5 people. The thesaurus, the Medical 
Subject Headings (MeSH), which they mainuiin includes about 
iS.tMlO main headings in a d- level hierarchy. 

While Ehe insirucEionsta indexers emphasize that they must 
baLse their indexing work on the whole document rather than 
on just the title and abstract, protocol analysis has revealed that 
in fact the indexers place a heavy emphasis on terms available 
in the title and abstract. 





Fig. 4.60. Working step Rada-5: Visual processing on a single word basis 
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Fig. 4.61. Working step Rada-12: Indexing a broader term as retrieval aid 
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Relevance assessment and encoding. Indexing is proposed as a candidate in- 
dexing term because of its striking pattern of occurrence. In the eyes of the 
relevant-catch strategy, it qualifies as a good catchword. “Indexing” has a di- 
rect connection to notions of the title (checked by the strategy relevant-call) - 
because, after all, thesauri are made for indexing. Harold stores indexing 
(strategy hold) in the memory area document theme. He writes his index term 
down. 



Working step Rada-12: Indexing a broader term as retrieval aid 



Improving retrieval by complementing a set of collateral concepts with a superconcept 
is the main indexing feature in the current working step. 

On the way to working step Rada-12, the indexing terms “Thesauri” and 
“Metathesauri” have been joined by “Medical thesauri” and “Computer-based 
thesauri”. Harold feels obliged to group them together under a broader term. 
Adding a broader term to a series of collateral concepts (i.e., concepts which 
possess a common superordinate concept) is standard practice. In the retrieval 
situation, the broader term is useful in searches that deal with all or many 
types of a concept, avoiding losses of relevant documents and the necessity to 
enumerate all collaterals individually. In our case, “Controlled vocabulary” 
can replace “Thesauri or Metathesauri or Medical thesauri or Computer-based 
thesauri” (compare Fig. 4.61). 

Since Harold works upon concepts which are present to his mind he needs no 
external input (see the exhibit of the working step). 

Indexing for better retrieval. The indexer’s intention to make retrieval safer 
and easier can be assigned to two strategies: 

• ind-safe: Index safely, add a descriptor if it enhances retrieval safety. 

• ind-broader: Use a broader term for indexing. 

They are triggered (or in other words, the summarizer is alarmed) by a list of 
indexing terms of the same sort. The indexer sees a need to patch a possible 
retrieval hole with a more general concept. A thesaurus or ontology is presup- 
posed by ind-broader, either in the summarizer’s mind or as external indexing 
vocabulary. From there, the broader term (or superconcept) is taken, ind- 
broader equips the generalization strategy with the collateral concepts. As its 
name promises, generalization goes up the hierarchy until it reaches a common 
superconcept for all collaterals, ind-broader proposes adding the superconcept 
to the indexation. 

Relevance assessment. Since the broader term is related by the obvious ge- 
neric relations to the narrower terms Thesauri, Metathesauri, etc., which are 
already part of the theme representation, the strategy relevant-call judges it at- 
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tachable and relevant. Controlled vocabulary is noted in the memory area 
document theme. Harold writes down his result (strategy write). 



4.3.6.S The Sperl sequence - the difncult representation 
of the epistemological subject model 

Summarizer: Hannelore Schott, IZ Sozialwissenschaften, Bonn, Germany 

The following sequence shows how Hanne indexes the concept epistemological 
subject model as the connected conceptual pair cognition theory. The concept 
is new to the summarizer. It occurs in a monograph by Waltraud Sperl: General 
theories on women's possibilities and opportunities for change, comprising 
around 2(X) pages. Hanne uses the thesaurus of the social sciences information 
center IZ Sozialwissenschaften. 

The indexing problem. From a comparison between the original formulation 
and the thesaurus descriptors, we can see that Hanne has had to accept a se- 
vere loss of precision in transposing the concept from the original document 
into thesaurus terms. This also illustrates her main problem: from the concep- 
tualization in the document to the presentation in thesaurus terms Hanne has a 
long way to go. No wonder that she needs a total of six working steps to repre- 
sent epistemological subject model under the given restrictions of thesaurus in- 
dexing. The summarizer comes to grips with two sorts of problems: 

• she is confronted with a new concept, namely epistemological subject 
model, which is not listed in her thesaurus; 

• she reconceptualizes the new concept with available broader concepts. 

Overview of the sequence. Out of the total 54 working steps of the Sperl in- 
dexing process, steps 18, 19, 23, 26, 27, and 47 are presented here. Before step 
18, Hanne has already come across the epistemological subject model in the 
table of contents in two headings of main sections: The epistemological subject 
model as the basis for a concept of change for women and Concepts of change in 
the women's movement from the aspect of the epistemological subject model. 
Hanne has memorized the rough points of the content organization, and hence 
also the epistemological subject model in working steps 2-5. In step 18, she 
reacts for the first time to the appearance of the epistemological subject model 
in the text. In step 19, she notes down this occurrence. In working step 23, 
Hanne attempts to reach a better understanding of the term. We meet her at the 
table of contents in step 26, inspecting the outline of the first section dedicated 
to several aspects of the epistemological subject model. In step 27, she refers 
back to her notes. She reproduces the concept at first with a combination of 
three descriptors, one of which is jettisoned again in step 47, while subjectivity 
and rationality, two descriptors produced in step 29, are maintained. 
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The structure of the Sperl summarizing process. Figure 4.62 gives an 
overview of the whole Sperl indexing process. The steps dedicated to the 
epistemological subject model are identified by arrows pointing to their source 
represen tation(s). In this way we can situate Hanne’s discontinous effort to 
index the epistemological subject model A general technical explanation of the 
figure is given in Sect. 4.3.5. 

According to the diagram of the working process, the first note, dealing with 
the epistemological subject model, is taken in step 19. The corresponding draft 
descriptors are produced in working step 27. In step 47, Hanne combines 
information from her document knowledge and the three draft descriptors to 
cancel the descriptor model because of its redundancy. 

The process overview diagram also allows us to observe phases during which 
the summarizer focuses on one type of subtask such as exploration, target text 
production, and revision. As long as information items are not processed further 
than into the document knowledge representation, the summarizer concentrates 
on document exploration. Hanne interrupts exploration activity by a first note- 
taking act in working step 19 and by the first descriptor assigment in step 25. 
During these steps, the first information items reach the note and the draft 
output representation, respectively. The exploration phase gives way to a se- 
quence where Hanne has swung the focus of her attention from information 
intake to the production of output, but nevertheless exploration steps remain 
frequent. They intermingle now with production-oriented working steps. Later, 
the production phase gives way to a so-called revision phase. In the Sperl 
process, revision would begin with working step 42 where, for the first time, a 
draft index term is questioned and replaced by a revised one. While Hanne re- 
vises some of her descriptors, she manages to deal with other activities such as 
reading or producing new descriptors, such that most of the time, her activities 
are mixed. 

The Sperl indexing exemplifies a working process with a phase organization 
(see Fig. 4.15). Readers may compare it with Fig. 4.34 where Marliese Gunther 
pursues an instant production strategy and begins to write on the spot. Whereas 
Hanne does not enter anything into her draft summary until working step 25, 
i.e., during almost half of the summarizing process, Marliese writes the first 
statement of the draft summary as early as in her working step 2. While Hanne 
concentrates on draft indexing in the middle phase of her work and allows for a 
relatively extended revision phase, Marliese keeps producing target statements 
as often as her input presents good material. She does revise at the end of her 
work, but not very much. 
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Fig. 4.62. The Sperl indexing process (overview) 
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Working step Sperl-18: The epistemological subject model 
is encountered in the introduction 



Hanne reads a passage of the introduction by sampling, Le., picking up only begin- 
nings of paragraphs. 

Indeed the introduction is quite long, Hanne notices already in working step 1 1 
when looking through it for an overview. So she explores it with moderate and 
variable intensity, skipping paragraphs, perceiving samples of others, and read- 
ing some passages in detail. 

At this point, she is already aware of the epistemological subject model as a 
central concept of the book (the first section is dedicated to it - see the 
document scheme window of Fig. 4.63). However, her understanding of the 
book’s subject is still limited. She operates locally and attaches to the current 
chapter heading what she learns about the epistemological subject model Only 
later will she relate the epistemological subject model to everyday theories of 
women, the global theme (see the document theme representation of Fig. 4.63). 

Document exploration by sampling. In the current step, Hanne is busy ex- 
ploring the introduction of the book, being aware of her position in the docu- 
ment outline according to the strategy by-form, and keeping to high-level ac- 
cess information as reconamended by the strategy top-level Since the intro- 
duction is so long, her metacognitive control insists on economical behavior 
(strategy economy). It is operationalized by a sampling technique (strategy 
sample) looking at units given by the document, currently paragraphs (strategy 
unit), and focusing attention on their beginnings (strategy first). 

Visual reading is done by the read-form strategy, assisted by read-on-demand 
that jumps from one reading position to the next. 

It is revealing to compare Hanne’ s sampling technique to the similar behav- 
ior of Edward in working step Goonatilake-95 (Fig. 4.55). They differ in two 
points: 

• Whereas Edward theorizes about finding topic sentences at the beginning 
of paragraphs, Hanne does not. She reads the beginnings because they of- 
ten convey important information, without having in mind authors who are 
systematically trained to start paragraphs with a topic sentence. Therefore 
Hanne is, in contrast with Edward, not presumed to have a topic-sentence 
strategy. 

• As Hanne does not intend to find sentences at paragraph beginnings, the 
read-param strategy for tailor-made information intake with sentence 
recognition is not needed. It can be replaced by the simpler strategy read- 
on-demand. 
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Relevance assessment and encoding. Hanne’s “hm” completes the working 
step and shows that she reacts to her reading. We assume that she has found 
relevant and encoded that the epistemological subject model is a change and 
learning theory for women. A simple “hm” seems poor evidence, but the evi- 
dence is strengthened when Hanne notes in the next working step the place 
where she found the definition of the epistemological subject model. 

The relevance assessment strategies insist on having at the paragraph begin- 
ning (strategy relevant-unit) an expression emphasized by a positive hint of the 
author (“such a ... is to be found” - strategy relevant-texthint). The corre- 
sponding statement can be attached to the chapter heading - the epistemo- 
logical subject model - by the strategy relevant- scheme (see the document 
scheme window of Fig. 4.63). 

However, a better hypothesis is that Hanne is collecting material relevant to 
the subtheme epistemological subject model. Then relevant-call would best test 
and link the candidate statement to the thematic representation. Since every- 
day theories are qualified therein as implying a theory of learning and personal 
development or change, nothing stands in the way of installing an elaboration 
relation. The reader is invited to provide an appropriate entry in the document 
theme representation. 



Working step Sperl-19: The epistemological subject model is noted 

Hanne visibly advances in her indexing of the epistemological subject model. She finds 
an explanation of everyday theories, which turn out to be a sort of a synonym of the 
epistemological subject model, such that at the end of the step she has a better grasp of 
the concept. Now she is able to relate the epistemological subject theory to the eve- 
ryday theories announced in the title. She notes where in the book it is explained. 



Two exploration acts. Hanne performs two exploration acts during the current 
working step: 

• she reads the paragraph about the everyday theories, 

• she determines the page number. 

For these two types of reading, she needs different exploration intentions: 

For reading the next paragraph it is important that the summarizer is still in 
the introduction, paying attention to a promising area of the document (strat- 
egy top-level). She is aware of her position in the document organization 
(strategy by-form). Information intake goes by paragraphs (strategy unit). When 
Hanne encounters the indicator phrase “... are the interesting constructs of our 
research here” she switches to full perception instead of reading paragraph be- 
ginnings only (strategy texthint - make use of textual hints in order to find rele- 
vant passages). Full reading for understanding is imposed by browse, optical 
reading is done by the standard readform strategy. 
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As soon as Hanne wants to take down her note, she needs to know the page 
number (strategy location - state explicitly the position of an information 
unit). She looks it up (strategy retrieve). Again she can profit from her knowl- 
edge of the document (and page) structure (strategy by-form). Visual reading 
includes going straight to the right position on the page (strategy read-free) 
and finding a known target (strategy read-find). 

A similar act of looking up a page number is found in step Goonatilake-65 
(Fig. 4.51). 

Relevance assessment and linking the epistemological subject model to the 
theme. Hanne reads a pjiragraph that is emphasized by the author (“... are the 
interesting constructs of research here” - strategy relevant-texthint). It has the 
special advantage of explaining that an epistemological subject model is an 
everyday theory. Hanne notes this (quasi-)synonymy. She states that she found 
a definition of the epistemological subject model after reading a passage that 
talks more about everyday theories than about the epistemological subject 
model itself 

The whole passage links up to the first section of the document scheme: The 
epistemological subject model as the basis for a concept of change for women 
(strategy relevant-scheme). As discussed in the previous step, relevant-call 
should also consider partial thematic structures such as the epistemological 
subject model and attach meaning items from the input document to them, too. 

The newly processed information can also be linked to the core theme of the 
book, the everyday theories of women (strategy relevant-call). As is often the 
case, more than one semantic link can be established, in particular elabora- 
tion, restatement, and explanation relations. 

Note taking. To produce a useful note, Hanne performs three subtasks: she la- 
bels the found passage with its semantic role as being an explanation and 
definition of the concept under investigation (strategy label). The label is not 
taken from the text, but from the summarizer’s own knowledge (strategy 
elaboration). After that, Hanne finds out on which page the interesting passage 
figures (strategy location). For this purpose, she retrieves the page number as 
described above. Next, she turns to writing. She abridges her statement 
(strategy memo) and writes it down (strategy write). 



Working step Sperl-23: Understanding the concept better 



Hanne is still unhappy with her understanding of the epistemological subject model. 
She tries to penetrate the concept better by rereading the explanatory passage. 

Routine summarization normally involves thinking strategies of a limited range 
according to a goal-oriented working plan. However, the routine may be 
interrupted at any time by metacognitive activities, general thinking, socializ- 
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ing, or other activities that span a wider range of cognitive abilities (see 
working step Mackin-7 (Fig. 4.31) where a checking activity plays the same 
role and step Hearn-6 (Fig. 4.44) which is wholly dedicated to planning). Such 
an intrusion of general intelligence may be motivated by processing problems, 
by sentiments, etc. In step Sperl-23 (compare Fig. 4.65) the summarizer comes 
to grips with an understanding problem. Whether this serves just her indexing 
task or whether it is also driven by natural curiosity - who would dare to 
decide? 

No new information is in fact assimilated. Hanne’s memory areas remain un- 
changed, although the information may be better encoded and better integrated 
into her prior knowledge. 

Metacognitive supervision, asking, and planning. “What does the episte- 
mological subject model mean, I haven’t quite understood.” The fact that 
Hanne discusses her degree of comprehension shows her metacognitive moni- 
toring at work. She falls back on a strategy of general knowledge acquisition 
(strategy know - manage to understand central concepts and facts contained in 
the document). “Whoops, again” she says and tries again (strategy repeat). 
Her understanding problem is not unusual. It corresponds to the standard ques- 
tion about a definition or explanation which summarizers inquire of the 
document (strategy question - compare working step Rada-3, Fig. 4.58). 

In order to obtain an answer from the text, Hanne adapts her working plan 
and foresees a second reading of the explanatory passage. Planning involves 
two activities: stating what will happen next (strategy plan) and updating the 
working plan accordingly (strategy working-plan). Instantly, the adapted plan 
is put to work. 

Rereading for better understanding. Hanne goes through the explanation 
again (strategy explore influenced by refresh). She has to find it (strategy 
search, standing for sequential search) before she can read it with the intention 
of understanding (strategy browse). She finds the passage on the previous 
page, page 12. Thus she needs three optical reading strategies: read-back 
(manages backward jumps in reading), read-find (identifies known targets), 
and read-form (takes characters and layout features in). 

In working step Sperl-24, which is not presented here, Hanne also rereads 
the remainder of the paragraph about the epistemological subject model with 
the same intention. 
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Fig. 4.65. Working step Sperl-23: Understanding the concept better 
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Wor king step Sperl-26: “How are we goii^ to And a keyword 
for the epistemological subject model?” 



Hanne returns in step Sperl-26 to the dijjicult indexing of the epistemological subject 
model. She does not manage to complete the indexation in the current working step. 

After representing in working step Sperl-25 the title Everyday theories of 
women: possibilities for change by the combined descriptors 

everyday theory (1), woman, change (1) 

Hanne resumes the indexing of the epistemological subject model. The exhibit 
(Fig. 4.66) shows the responsible strategy index (represent a central document 
concept by a descriptor or a combination of descriptors) in the company of ind- 
increment that enables the indexing process to stretch over several working 
steps. 

Fishing for indexing ideas in the table of contents - without success. “... 
how are we going to find a keyword for the epistemological subject model? 
Hm ...” expresses local goal setting (strategy goal - state your current goal). 
The quoted utterance informs us what is going to happen next (strategy plan). 
Aim and next activity are entered into the working plan (strategy working- 
plan). Nothing is said about the way this goal can be reached, but from 
subsequent activity the decision becomes clear: Hanne tries to fish for good in- 
dexing ideas in the table of contents (strategy ind-outline - draw indexing 
concepts from the document outline). This corresponds to a well-known 
technique of summarizers. Unfortunately, it produces nothing useful. 

Interpreting the table of contents. Hanne’ s structure-driven and goal-driven 
exploration of the table of contents is interesting. After exploring at the level of 
the main sections of the book: core assumptions, criticism, methods of change, 
Hanne concludes that the section about the epistemological subject model 
seems most promising (strategy inference). She states this: “So, the first 

chapter” (strategy state-result - state your result), adapts her working plan 
again and starts inspecting the first level of subdivision in the first chapter. 
There, she looks at all headings in normal reading sequence. The following 
strategies contribute to steering the information intake: 

• content (explore the table of contents) makes sure that Hanne uses the 
table of contents which qualifies as a rich information area. 

• by-form (follow the formal document organization) enables Hanne to find 
the table of contents and to exploit its inner organization. 

• heading (restrict perception to headings) excludes everything which is not 
a heading from information acquisition. As in the table of contents all 
entries are headings, they all are admitted to processing. 
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• level-in (restrict perception to one level of text organization) organizes 
Hanne’s reading by levels of document structure: she first reads items of 
chapter level, then headings inside Chapter 1 . 

• unit (perceive by units, e.g., paragraphs) partitions input in units given by 
the document, here individual entries in the table of contents. 

For optical reading, Hanne needs a strategy that takes in characters and layout 
(read-form), a finder (read-find) for identifying the chapter headings, and a 
way to move backward during reading (strategy read-back). 



Working step Sperl-27: “model - cognition(2) - theory(2)” 



Hanne represents the epistemological subject model with descriptors from the Social 
Sciences Information Center thesaurus. From her commentary it becomes clear that she 
is dealing with a difficult case: '‘..how might I go about it, this, this term”. Although 
she is not happy with her solution, she accepts and notes it for the moment. 

Hanne’s problem is that she has to index the epistemological subject model 
with extremely general descriptors, because the thesaurus lacks more specific 
ones. To map her concept to a thesaurus description, she first reconsiders the 
precise wording of her document, then she is busy with the thesaurus. To both 
information sources she devotes the necessary attention, such that indexing 
knowledge becomes the most prominent feature of the working step. In the 
exhibit (Fig. 4.67), three packages of indexing expertise are distinguished: 
strategies for developing indexing concepts from the document, strategies of 
thesaurus use, and a strategy for formally correct presentation of descriptors. 

Searching for inspiration in the introduction. After her failure in the table 
of contents, Hanne returns now to the explanation on page 12, hoping to find 
there a starter indexing concept. The thinking-aloud protocol records her 
planning (“Read on page 12 how I might go about it, this term...”). She states 
what to do next (strategy plan), updates her working plan (strategy working- 
plan) and proceeds to action. 

Configurii^ an indexation using document and thesaurus. As Hanne has 
to index a difficult concept that she learned minutes ago, she has to think seri- 
ously about it (strategy ind-plan - determine an indexing concept without 
considering its formulation). As planned, she looks into the document to find 
an indexing idea (strategy question - question your input text). She obtains 
from there everyday experiences and everyday theories. A standard indexing 
approach is to subsume an author concept under a superconcept (strategy ind- 
broader) and/or to express it by a combination of two descriptors (strategy ind- 
combi). Hanne applies them. So ind-broader engages the strategy generaliza- 
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tion to climb up the concept hierarchy. It returns candidate concepts that are 
general enough to be descriptors: cognitive theory and model. 

The next package of indexing expertise Hanne applies is dedicated to 
thesaurus use. Her first problem is to delineate more precisely her possible 
indexing concepts, taking into account that the definition inside the thesaurus 
may differ from the normal use in the domain. She begins by checking the 
thesaurus concept model (see thesaurus segment in the input window), which 
is explained by its naiTower terms and the used /or-quasi-synonyms. She 
follows the strategy ind-sem - use thesaurus information (scope notes, 
relations, etc.) to understand the meaning of a descriptor. After that, model is 
not forseen for representing a cognitive theory. Hanne states that this went 
wrong (strategy failure). Nevertheless she keeps model until further examina- 
tion (strategy wait-and-see - postpone a decision until you know more). 

Now Hanne tries to find a thesaurus entry cognitive theory (strategy descrip- 
tor - check if an expression is a descriptor before using it). No success again, 
the alphabetical array lists only a cognitive learning theory. “That doesn’t fit,” 
Hanne says (strategy failure). She starts a new thesaurus search with the con- 
cept cognition. This time she finds a suitable entry. It tells her to represent 
cognitive theory by cognition + theory. Hanne accepts cognition + theory 
(strategy ind-get - use the thesaurus to obtain descriptor candidates) with some 
grumbling (strategy sentiment - express your feelings) as a preliminary solu- 
tion (strategy tentative - accept a tentative solution) and with the reservation 
to retract it later, if better knowledge is at hand (strategy wait-and-see - post- 
pone a decision until you know more). 

Formal correctness, Hanne connects the general term “theory” with “cogni- 
tion” (strategy ind-index) as her thesaurus stipulates. She writes down the de- 
scriptors and their connecting operators (strategy write). 

Consulting different inputs. In the current working step, Hanne deals with a 
maximum of three different external inputs: 

• the source document when searching for an indexable notion 

• the thesaurus as a source for an accepted target expression 

• her notes if she needs to look up the location of her definitional passage 

Whereas it is evident that she reads from the document and from the thesaurus, 
it remains unclear whether she really looks at her notes. In any case Hanne 
must be able to switch between inputs (strategy switch - switch between 
information sources). She needs acquisition intentions and external reading 
strategies for dealing with all of them. 
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As long as she reads in the source document, she sticks to the introduction 
(strategy top-level). This is only possible because she knows the document 
organization (strategy by-form). She finds the interesting passage through an 
informed retrieval (strategy retrieve) and reads it for understanding (strategy 
browse). In the thesaurus, Hanne can retrieve the respective entries (strategy 
retrieve) because she knows the thesaurus structure (strategy by-form). The 
note is short enough for normal reading (strategy browse). From all her 
information sources, Hanne re-acquires data which she knows in principle. This 
is also true for the thesaurus whose co-author she is. So all exploration serves 
memory support (strategy refresh). External reading can be done by read-find 
and read-form. 



Working step Sperl-47: The ^^model” descriptor is cancelled 



In working step Sperl-47, Hanne revises the descriptor combination ''model”, 
"cognition” and "theory” at the same time as two additional descriptors, "subjectivity” 
and "rationality”, 

Hanne revises as soon as she has an overview of her descriptors. She follows 
two typical lines of thought: 

• rev-ind-content - improve the factual correctness of the indexation. The 
draft configuration of concepts may have content flaws. Frequent mistakes 
are concepts that duplicate or intersect each other, or do not hit the 
subject as precisely as possible. 

• rev-ind-complete - improve the completeness of the indexation. An indexa- 
tion may be improved by adding descriptors suggested, for instance, by the 
thesaurus or a possible retrieval situation rather than by the document 
itself 

Because they are by and large redundant, the existing descriptors model and 
theory raise criticism. Hanne complies with the anti-redundancy rule once (say 
it only once) and cancels one of them (“model”)- Her attempt to improve the 
completeness of the indexation goes wrong. After rereading her descriptors, she 
is tempted to add a descriptor objectivity (strategy elaborate). However it 
incurs the resistance of an indexing strategy named here ind-restrict - restrict 
yourself to important concepts, do not index others. Thus Hanne states her 
failure and gives the reason (strategy explain): everyday theories or epistemo- 
logical subject models are subjective. Objectivity would be a misleading de- 
scriptor for them. 

In working step 48, which is not presented here, Hanne decides to leave it at 
an index with a total of 11 descriptors. The remaining steps of the working pro- 
cess are devoted to classification. 
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4.3.6.9 Working step Mills-15: A classification notation 
is assigned 

Summarizer: Ingetraiid Dahlberg, INDEKS, Frankfurt, Germany 

Step Mills-15 demonstrates classification. This means that a document is summarized in 
terms provided by a classification system, usually expressed by a notation. 

During the current step (compare Fig. 4.69), a first classification notation is as- 
signed. The resulting notation states the subject of the paper, which happens to 
be the Bliss classification. The notation to be assigned is from the 1985 version 
of the Classification Literature Classification. In the immediately preceding 
working step Inge has decided to use her own classification system and also 
noted the decision down (see product area of the exhibit). 

As is often the case, classifying occurs here after indexation. When the 
working step begins, the document is already known to Inge as far as needed 
for indexation and classifying. In the present case, she has felt obliged to look 
only at the title and the outline. Since she is in the core area of her expertise, 
there is no reason to doubt that this very parsimonious information is sufficient 
for her to do the job. Somebody else’s cognitive preconditions might have 
stimulated a different approach. 

The working step shows how an expert summarizer arrives at her aims with 
minimal mental effort. Not only does the summarizer hardly read the paper - 
she does not invest much thinking work in classifying it either. Since 
classifying is seen as broad description, she needs to know and to work just 
enough to find the three to five main concepts and notations. 

When Inge takes up the cognitive activities of the current working step, the 
beginning of the article lies before her eyes as external input. As a second 
external input Inge uses the classification system. Since she is its author, the 
system is known to her as well. In the concrete situation, its topmost de- 
scription level, which is reproduced in the external input window, is adequate. 
The result of the classification step has been entered in the product area at the 
bottom: the Bliss Classification is represented by the code “45”. Inge has also 
written this down (see external output window). 

Identifyii^ the theme. First and foremost, a broad description of a paper must 
pinpoint the theme, and in the case of classifying, express it by a notation. 
Inge uses current techniques: 

• She draws concepts from the title (strategy class-title). 

• She uses the outline to find main concepts of the paper (strategy class- 
outline). 

• She approaches the theme by combining both sources (strategy class- 
theme). 
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Applying the three strategies puts the Bliss Classification into the focus of 
attention, because it is named in the title and - in abbreviated form - (BC2) in 
two headings. By superficially basing her classification on the wording of the 
title and on the outline, Inge achieves the aim of the classification: a 
description according to the theme of the article (strategy class-theme). During 
the whole working time, Inge may refresh her memory by looking at the 
beginning of the paper. 

Assigning the notation. The notation for the Bliss Classification is retrieved 
from the classification system. Inge’s commentary about the search in the 
classification system (“has the number 4...”) permits us to recognize that she 
has some idea of the numerical code “45” in her head. Consulting the system 
means to her returning to information that she knows in principle. So she 
switches to her second information source (strategy switch), the exploration 
mode is set by the refresh strategy, and the search can rely on previous 
knowledge of the document structure (strategy retrieve). 
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4.4.1 Systematic display 

1 meta: Metacognition 
1 meta A: Expressive skill 

1 current: Describe what you are 
doing at the moment. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

2 expression: Express yourself - 

opinion, concerns, or whatever. 
(Edward, Marhese) 

3 opinion: State your opinion. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

4 personal: Develop personal 
interest and a personal approach to 
the document. (Andreas, Edward, 
Harold, Inge, Marliese) 

5 self: Characterize yourself. (Inge) 

6 sentiment: Express your feelings. 
(Andreas, Edward, Harold, Inge) 

7 standpoint: Introduce your point of 
view. (Edward, Harold, Inge, 
Marliese) 

1 meta B: Self-management 

1 meta Bl: Self-control, self- 
monitoring 

8 blocked: State explicitly when you 
are blocked. (Edward) 

9 comfort: Make yourself feel 
comfortable. (Edward) 

1 0 concentration: Keep up your 
concentration. (Edward) 

1 1 fit: Keep yourself fit: take a break, 
exercise, breathing exercises. 
(Edward) 



12 monitor: Monitor your activity. 
(Edward) 

1 meta B2: Volition 

1 3 cheer: Cheer yourself on. (Edward) 

1 4 determined: Work decisively 
towards your goal. (Edward, Hanne) 

15 Don’t worry about 
problems that are irrelevant to the 
goal. (Hanne, Harold) 

1 6 own-way: Do it your own way. 
(Edward) 

1 7 stroll-along: Don’t be too strict 
when pursuing your goals, take 
things as they come. (Edward) 

18 task: Restrict your activities to the 
matter in hand. (Harold, Inge) 

1 9 try-again: Be tenacious. Try again. 
(Edward, Inge) 

1 meta C: Principle-driven action 

20 adapt: Find a solution that fits the 
current situation. (Marliese) 

21 avoid: Avoid problems. (Edward, 
Harold) 

22 consequent: Be consequent. 
(Marliese) 

23 feasible: Do feasible things. 
(Edward, Marhese) 

24 improve: Improve your results at 
every opportunity. (Edward, 

Hanne, Harold) 

25 issue-of -method: Discuss questions 
of method. (Edward) 

26 limits: Find solutions that do not 
overtax your competence. (Inge, 
Marliese) 
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2 7 method: Stick to methodical 

principles. (Edward, Harold, 
Marliese) 

28 Be as orderly as necessary. 
(Edward, Hanne, Harold, Inge) 

29 precedent: Make use of precedents. 
(Andreas, Harold, Marliese) 

30 rules: Keep to the rules. (Hanne, 
Marliese) 

3 1 state-principle: State the 
principles behind your conduct. 
(Edward, Harold, Inge, Marliese) 

32 tentative: Accept a tentative 
solution when an optimal one is 
out of reach or unnecessary. 
(Andreas, Edward, Hanne, Harold, 
Marliese) 

33 thorough: Don’t do things by 
halves. (Marliese) 

1 meta D: Interaction with the 
environment 

1 meta Dl: General interaction 

34 disturbed: Cope with interruptions. 
(Marliese) 

35 experiment: Keep to the rules of 
the experimental setup. (Hanne, 
Inge, Marliese) 

36 interrupt: Cope with intrusions by 
others. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

1 meta D2: Dealii^ with things 

3 7 preps: Make preparations for your 

work. (Edward, Harold, Inge) 

38 tech: Cope with technical 
problems. (Andreas, Edward, 
Harold, Inge, Marliese) 

39 tools: Prepare your tools. 

(Andreas) 

1 meta D3: Interaction with people 

40 aside: Use the opportunity to make 
asides to your listener. (Edward, 
Harold, Inge) 



41 audience: Involve your audience. 
(Edward, Hanne, Inge, Marliese) 

42 language: Choose the right 
language. (Inge) 

43 nice: Be nice. (Marliese) 

44 physical: Explain your physical 
well-being where necessary 
(tiredness, etc.) (Inge) 

45 polite: Be polite. (Hanne, Inge) 

2 Control: Control of working 

processes 

2 control A: Planning and deciding 

2 control Al: Planning 

46 alternative: Consider alternatives. 
(Andreas, Edward, Hanne, Harold, 
Marliese) 

4 7 check-plan: Check that your are 

following the working plan. 
(Edward, Hanne) 

48 define -task: State what your current 
task is. (Harold) 

49 eayy.‘ Take the easy way to your 
goal. (Andreas) 

50 estimate-problem: Assess the 
difficulty of your task. (Edward, 
Harold) 

51 estimate-task: Estimate what 
remains to be done. (Edward, 
Hanne) 

52 explain: Explain your actions, the 
situation, etc. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

53 foresee: Have foresight, prepare 
for what lies ahead. (Edward, 
Harold, Marliese) 

54 generate-and-evaluate: Generate 
alternative solutions and choose 
the most appropriate. (Inge) 

55 goal: State your current goal. 
(Edward, Hanne, Harold, Inge) 

56 interest: Decide whether a working 
step is of interest. (Harold) 
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57 look-ahead: Stop and explore 
what’s coming next. (Edward, 
Hanne) 

58 open-point: State what points are 
still open. (Hanne) 

59 plan: State explicitly what you 
want to do next. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

60 special-problems: Adapt your 
working plan to the current 
document. (Andreas, Hanne, Inge) 

61 working-plan: Update your 
working plan. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

2 control A2: Decision-making 

62 cfec/We.* Decide. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

63 justify: Justify your decision. 
(Andreas, Edward, Hanne, Inge, 
Marliese) 

64 prepare-decision: Prepare your 
decision, e.g., by data or problem 
analysis. (Edward, Hanne, Harold) 

65 start-decision: Start a decision 
process. (Edward, Hanne, Harold) 

2 control B: Control 

2 control Bl: Controlling normal 

processes 

2 control Bll: Driving the activity, 

start and stop 

66 abort: Abort wrong actions. 
(Andreas, Edward, Hanne, Harold) 

6 7 continue: Continue (after an 

interruption, etc.). (Edward, Inge) 

68 done: State finished actions. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

69 next: Go on. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

70 next-stage: Begin the next stage. 
(Inge, Marliese) 

71 no-op: Do nothing. (Edward, Inge, 

Marliese) 



72 precond: Check that the 
preconditions are fulfilled when 
starting a working step. (Hanne, 
Harold) 

73 start: Start. (Andreas, Edward, 
Harold, Inge, Marliese) 

74 stop: Stop. (Andreas, Edward, 
Hanne, Inge, Marliese) 

75 suspend: Suspend your activity. 
(Edward) 

2 control B12: Determine the 
mode of operation 

76 increment: Proceed incrementally. 
(Andreas, Edward, Hanne, Inge, 
Marliese) 

7 7 parallel: Carry out parallel 

activities. (Edward, Inge, Marliese) 

78 speed-up: Speed up. (Edward) 

2 control B13: Dealii^ with data 

79 collect: Collect material. (Edward, 
Marliese) 

80 earmark: Earmark items for later 
processing. (Edward) 

81 exclude: Exclude items from 
processing. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

82 log-in: Put units on the agenda, 
e.g., text passages for processing. 
(Edward) 

83 log-out: Delete items from the 
agenda as soon as they are done. 
(Edward) 

84 production: Produce a unit as soon 
as you have sufficient material. 
(Andreas, Marliese) 

2 control B14: Checking results 

85 check: Check your results. 
(Andreas, Edward, Harold) 

86 last-check: Check the results at the 
end of a working process. (Hanne) 




272 4 Professional Summarizing 



2 control B15: Economy 

87 economy: Don’t process more 
information than strictly 
necessary. (Edward, Hanne, Harold) 

88 enough: Do enough and not more. 
(Harold, Inge, Marliese) 

89 means-ends: Tailor your means to 
the ends. (Edward, Inge, Marliese) 

90 more: Retain too much rather than 
too little. (Edward, Hanne, Harold) 

91 time: Control how much time you 
invest. (Edward, Inge) 

92 time-limit: Omit what takes too 
much time. (Andreas, Hanne, Inge) 

2 control B2: Controlling irregular 

processes 

2 control B21: Repetitions 

93 backpoint: Mark entry points for 
backtracking. (Edward, Harold) 

94 refresh: Refresh your memory as 
often as necessary. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

95 repeat: Do it again. (Edward, 

Hanne, Inge) 

96 reset: Go back. (Andreas, Edward, 
Hanne, Inge, Marliese) 

97 return: Return to an earlier point in 
your working plan. (Edward, 

Harold, Inge) 

2 control B22: Solving problems 

98 critique: Criticize your solutions. 
(Hanne, Harold, Marliese) 

99 error: State and correct your errors. 
(Edward, Hanne, Harold, Inge, 
Marliese) 

100 failure: State failures. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

101 last-try: Have a last h*y at solving 
unsolved problems, e.g., at the end 
of a working process. (Edward, 
Hanne) 



102 omission: Correct omissions 
immediately. (Andreas, Edward, 
Hanne, Harold) 

103 retract: Retract solutions if they 
prove to be no good. (Edward, 
Harold, Inge, Marliese) 

1 04 unsolved: Keep an eye on unsolved 
problems and try to solve them at 
the first opportunity. (Edward, 
Hanne) 

2 control B23: Postponii^ 

operations 

1 05 later: Do it later. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

1 06 wait-and-see: Postpone a decision 
until you know more. (Edward, 
Hanne, Marliese) 

3 intell: Basic intellectual activities 

and literacy 

3 intell A: Thinking 
3 intell Al: Dealing with knowledge 

107 activate: Activate knowledge for 
current use. (Edward, Harold, Inge, 
Marliese) 

1 08 aggregate: Aggregate parts to form 
a whole. (Inge) 

1 09 apply: Apply newly learned 
concepts. (Edward) 

110 background: Introduce your 
background knowledge. (Harold, 
Inge) 

111 check-inform: Check information 
whenever there is the slightest 
doubt. (Andreas, Hanne, Inge, 
Marliese) 

112 combine: Combine information 
from different sources. (Edward, 
Harold, Inge, Marliese) 

113 compare: Compare items. (Edward, 
Hanne, Harold) 

114 count: Count. (Edward, Hanne, 
Harold, Inge) 
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115 dummy: Use dummies or general 
concepts when there is no interest 
in details. (Andreas, Edward, 

Hanne, Harold, Marliese) 

116 estimate-number: Estimate a 
numerical value. (Edward, Harold) 

117 evaluate: Evaluate solutions, 
information, etc. (Andreas, 

Edward, Hanne, Harold, Inge) 

118 experience: Fall back on previous 
experience. (Marliese) 

119 hypothesis: Formulate a 
hypothesis, e.g., before 
investigating. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

120 look-up: Look it up, use a reference 
tool. (Andreas, Hanne) 

121 memorize: Memorize text 
passages. (Edward, Hanne, 
Marliese) 

122 memory: Retrieve information 
from memory. (Marhese) 

123 own-words: State it in your own 
words. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

124 state-problem: State the problem. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

125 state -result: State your result. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

126 subquestion: Ask easy questions 
first in preparation for harder ones. 
(Harold) 

127 sum-up: Sum up what you have 
achieved. (Andreas, Edward, 

Hanne, Harold, Inge, Marliese) 

128 trial-and-error: If in doubt, try it 
out. (Edward, Hanne) 

3 intell A2: Understandii^ text 
(problem cases only) 

129 abbrev: Resolve abbreviations. 
(Inge, Marliese) 



130 degree: State the degree of your 
understanding. (Inge) 

131 know: Manage to understand 
central concepts and facts 
contained in the document. 
(Edward, Hanne, Harold, Inge, 
Marliese) 

132 knowledge: State your own 
knowledge level with respect to 
the current topic. (Harold) 

133 open-question: State understanding 
problems, information gaps. 
(Edward, Hanne, Harold, Inge, 
Marliese) 

3 intell A3: Reasoning 

134 calculate: Calculate. (Hanne) 

135 generalization: Generalize. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

136 inference: Infer implied 
information. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

137 reason: State reasons for facts. 
(Inge) 

3 intell A4: Imagination 

138 idea: Give way to own ideas and 
check their usability. (Edward) 

139 imagine: Imagine a result. (Inge) 

140 imagine-and-evaluate: Imagine a 
solution and its effect. (Andreas, 
Edward, Hanne, Harold) 

141 suggestion: Propose a better 
solution. (Edward, Hanne) 

3 intell B: Reading 

142 read: Read sequentially. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

143 read-back: Read jumping back. 
(Edward, Hanne, Harold, Inge) 

144 read-fast: Read fast with reduced 
information intake. (Andreas, 
Edward, Harold, Inge, Marliese) 
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145 read-find: Read for retrieval of a 
known item. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

146 rea^/-/o/7W.* Read considering 
optical features and layout. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

147 read-free: Read in a free sequence 
(“direct access”). (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

148 read-image: Read image 
information. (Edward, Harold, 

Inge, Marliese) 

149 read-on-demand: Read 
intermittently, on demand. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

1 50 read-open: Open your eyes for 
anything that strikes you. (Edward, 
Harold) 

151 read-over: Obtain an optical 
overview. (Andreas, Edward, 
Harold, Inge, Marliese) 

152 read-param: Read according to an 
individually specified parameter or 
pattern. (Edward, Hanne, Harold, 
Inge, Marliese) 

1 53 read-proof: Read in proofreading 
mode. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

3 intell C: Writii^ and drawing 
3 intell Cl: Writing 

1 54 annotate: Annotate your document. 
(Edward) 

155 capitals: Write in capital letters. 
(Inge) 

1 56 insert: Insert a passage in an 
existing text. (Andreas, Edward, 
Hanne, Harold, Marliese) 

157 memo: Abbreviate. (Andreas, 
Edward, Hanne) 

1 58 sequence -number: Number your 
units in sequence. (Edward) 



159 shorthand: Use short 
representations of text passages. 
(Inge) 

160 signature: Sign. (Andreas) 

161 start-writing: Start writing. 
(Andreas, Hanne, Harold, Inge, 
Marliese) 

162 write: Write. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

3 intell C2: Drawii^ 

163 arrow: Draw an arrow. (Edward) 

164 asterisk: Use an asterisk to mark 
an entry point or a text unit. 
(Edward) 

165 box: Draw a box around a unit. 
(Edward) 

166 cancel: Cancel text passages. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

167 circle: Circle it. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

168 cross: Mark a text passage with a 
cross. (Andreas, Edward, Harold) 

169 cross-out: Cross a text passage 
out. (Edward) 

1 70 link: Link text passages, e.g., 
with an arrow. (Edward) 

171 mark: Mark text passages in the 
margin. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

172 mark-stop: Mark how far you have 
got in the text. (Edward) 

1 73 single-out: Single out a text 
passage. (Andreas) 

174 space: Keep some free space. 
(Edward, Hanne) 

175 tick: Tick a unit. (Edward) 

176 underline: Underline a text 
passage. (Andreas, Edward, Harold, 
Inge) 
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3 intell C3: Mixed graphic and 
written utterances 

177 layout: Give your product an 
attractive layout. (Inge) 

178 mark-list: Mark a list structure of a 
text passage, e.g., by numbering. 
(Edward, Harold) 

4 prof: Professional skills: 
abstractii^, indexing, classifying 
and descriptive cataloging 

4 prof A: Information acquisition 
4 prof Al: Interaction with the 
document 

4 prof All: Exploring document 
features 

179 document -feature : Deitrmmo 
important characteristics of the 
document or of its parts. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

1 80 flaw: State content and 
presentation flaws of the document 
and correct them if necessary. 
(Edward, Harold, Inge) 

181 missing: State what is missing in 
the document. (Edward) 

1 82 retum-to-source: Return to the 
original whenever needed. (Hanne, 
Harold) 

1 83 volume: Determine the size of a 
text component. (Andreas, Edward, 
Inge, Marliese) 

4 prof A12: Navigation in the 
document 

4 prof A12.1: Orientation 

1 84 current-position: Determine your 
current position in the document. 
(Andreas, Edward) 

185 get-overview: Obtain an overview 
of the document or a part of it. 
(Edward, Hanne, Harold, Inge) 



186 hint: Exploit the author’s text 
organization hints. (Edward, 
Hanne, Harold) 

187 location: State explicitly the 
position of an information unit. 
(Andreas, Edward, Hanne, Inge, 
Marliese) 

188 orient: Orient yourself with the 
help of formal guiding information 
of the document (table of contents, 
etc.). (Edward, Hanne, Harold, 
Marliese) 

189 use-index: Use the index and other 
comparable lists for orientation. 
(Edward) 

4 prof A12.2: Movii^ through the 
document 

190 by-form: Follow the formal 
document organization. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

191 course: Consciously follow a 
given order, especially the order of 
the document. (Edward, Inge, 
Marliese) 

192 follow: Follow in-text cross- 
references. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

193 jump: Jump, especially in reading. 
(Edward, Harold, Inge) 

1 94 pass: Begin a new pass through the 
document or a part of it. (Edward, 
Hanne, Inge, Marliese) 

195 skip: Skip text passages 
(forwards). (Edward, Inge, 

Marliese) 

4 prof A13: Interaction with 
document meaning 
4 prof A13.1: Elaboration of 
document meaning 

1 96 answer: Obtain explicit answers to 
your questions to the document. 
(Edward, Inge, Marliese) 
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197 confirm: Try to confirm your 
expectations in the document. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

1 98 contour: Use a general pattern to 
carve out a meaning structure in the 
document. (Andreas, Edward, 

Hanne, Harold) 

1 99 label: State the meaning role of a 
text passage. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

200 macrosentence: State a global 
meaning structure of the document. 
(Hanne, Harold, Inge) 

201 main-point: Identify the most 
important points of a text unit. 
(Harold) 

202 question: Question your input text 
using your own factual knowledge. 
(Edward, Hanne, Harold, Inge, 
Marliese) 

203 main-thread: Identify the central 
text topic. (Edward, Hanne) 

204 topic-feature: Identify 
characteristics of the text topic. 
(Edward) 

4 prof A13.2: Completii^ the 

document meaning from your own 

knowledge 

205 bridge: Bridge unclear passages of 
the document with your own factual 
knowledge. (Harold) 

206 elaborate: Elaborate statements to 
expand what you find in the text. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

207 think-yourself: If the author does 
not tell you, think it out for 
yourself. (Harold, Inge, Marliese) 



4 prof A14: Considerii^ the social 
context of the document 
4 prof A14.1: The relationship with 
the author 

208 argument: Argue with the author. 
(Edward) 

209 author’ s-voice: Accept the author’s 
opinion. (Edward) 

210 motive: Determine the motivation 
of the author. (Harold) 

211 who-author: Find out about the 
author. (Edward, Inge) 

4 prof A 14.2: Consider the 
pragmatics of the document 

212 intertext: Elaborate intertextual 
meaning relations. (Inge) 

213 pragmatic: Put the document in its 
research and application 
environment. (Harold, Inge) 

214 references: Use the references to 
find out about the intellectual 
background of the document. 
(Edward, Harold, Inge) 

215 sources: Consider a document with 
respect to the sources (documents 
or authors) it takes its knowledge 
from. (Inge) 

4 prof A15: Usii^ in-text meaning 
relations 

4 prof A15.1: Exploiting 
redundancy relations 

216 better: Choose the better or best 
presentation of an information 
item in the original. (Andreas, 
Edward, Hanne, Harold, Inge) 

217 -50 wrce.- Compare different 
presentations of the same 
information item. (Andreas, 
Edward, Harold, Marliese) 

218 multiple-use: Be aware of the 
multiple use of meaning units in 
the document. (Edward, Inge) 
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219 once-in: Skip over items that you 
have already seen. (Andreas, 
Marliese) 

220 redundant: Make use of redundant 
information, e.g., for quality 
control. (Hanne) 

4 prof A15,2: Using principles of 

information organization 

221 conclusion: Use the conclusion to 
check the completeness of your 
abstract. (Inge) 

222 core-in-abstract: Expect the most 
important contents of the 
document in the abstract. (Harold) 

223 discussion: Use the section 
“discussion” of the document to 
check your results during abstract- 
ing. During indexing, skip it. 
(Marliese) 

224 form-reflects -content: Assume that 
the form of a document reflects the 
structure of the described object 
and the importance of its com- 
ponents. (Edward) 

225 formula: Expect a definition of 
formulas, functions, etc., at their 
first occurrence. (Inge) 

226 head-in-text: Assume that related 
headings and text components 
correspond in the essential 
meaning units. (Edward, Harold, 
Inge) 

227 sum-in-text: Expect that in-text 
summaries and the document itself 
correspond in the essential 
meaning units. (Harold, Inge) 

228 text- image: Assume that text and 
image information in a document 
is related. (Edward) 

229 use-doctype: Use the regularities of 
document types and structures. 
(Harold, Marliese) 



4 prof A16: Checking the usability 
of the document 

230 test-index: Test the back-of-the- 
book index. (Inge) 

4 prof A2: Task-oriented 
perception of the document 
4 prof A21: Strategic control of 
perception 

231 define -filter: Define an interest 
profile for information intake. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

232 explore: Investigate what a 
document passage means. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

233 start -explore: Begin exploring a 
document. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

234 stop -explore: Stop exploring a 
document, especially if it is no 
longer worthwhile. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

235 zoom: Redefine the perceptive 
filter. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

4 prof A22: Executii^ perception 
techniques 

4 prof A22.1: Definii^ global 
perception techniques 

236 browse: Perceive with the aim of 
information intake. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

237 example: Perceive by examples. 
(Edward) 

238 image: Interpret image 
information. (Edward, Hanne, 
Harold, Inge, Marliese) 

239 open: Be open for everything that 
strikes you. (Edward, Harold) 
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240 overview: Restrict perception to 
an optical overview. (Andreas, 
Edward, Harold, Inge, Marliese) 

241 proof: Perceive in proofreading 
style. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

242 retrieve: Find meaning units 
selectively, using the information 
structure of the document. 

(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

243 sample: Explore a text passage by 
sampling. (Andreas, Edward, 
Hanne, Harold, Inge) 

244 search: Search for information or 
text passages (blind search). 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

245 select: Perceive only text passages 
that correspond to your current 
parameter or pattern. (Edward, 
Hanne, Harold, Inge, Marliese) 

246 shallow: Perceive in shallow mode 
to find out if there is interesting 
information. (Andreas, Edward, 
Harold, Inge, Marliese) 

247 structure: Perceive the structure of a 
text passage. (Harold, Inge) 

248 switch: Switch between 
information sources. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

249 table: Perceive information from 
tables. (Edward, Harold, Inge, 
Marliese) 

250 unit: Perceive by units, e.g., 
paragraphs. (Andreas, Edward, 
Harold, Inge, Marliese) 

251 word: Perceive word by word. 
(Edward, Harold) 



4 prof A22.2: Adaptii^ the 
perception filter to data 
4 prof A22.21: Reacting to optical 
highlighting 

252 marked: Concentrate on passages 
you have marked yourself. 
(Andreas, Edward, Hanne, Harold, 
Inge) 

253 skim: Perceive highlighted 
passages only. (Andreas, Edward, 
Hanne, Inge, Marliese) 

4 prof A22.22: Reacting to textual 
hints 

254 hope: Search through promising 
passages for relevant information. 
(Hanne, Inge) 

255 texthint: Make use of textual hints 
in order to find relevant passages 
(topic sentences, in-text 
summaries, etc.). (Andreas, 
Edward, Inge, Marliese) 

256 first: Search through the 
beginnings of text units. (Andreas, 
Edward, Hanne, Inge, Marliese) 

257 first-last: Search through 
beginnings and ends of text units. 
(Edward) 

258 last: Search through the ends of 
text units. (Andreas, Edward, 
Hanne, Inge, Marliese) 

259 topic-sentences: Read topic 
sentences only (first sentences of 
paragraphs). (Edward) 

4 prof A22.24: Reacting to the 
formal document organization 
4 prof A22^41: Choosii^ the text 
organization level for perception 

260 ex-form: Explore the formal 
structure of a text component 
(outline, tables, footnotes, etc.). 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 
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261 level-in: Restrict perception to 
one level of text organization, 
e.g., headings. (Andreas, Edward, 
Harold, Inge, Marliese) 

262 text -lev el: Explore the text body 
of the document. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

263 top-level: Explore document 
passages that contain global 
information (introduction, table of 
contents, etc.). (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

4 prof A22.242: Choosii^ units of 
text organization according to 
formal features 

264 content: Explore the table of 
contents. (Andreas, Edward, 

Hanne, Harold, Inge, Marliese) 

265 footnote: Perceive footnotes. 
(Edward, Hanne) 

266 heading: Restrict perception to 
headings. (Andreas, Edward, Inge, 
Marliese) 

267 no -title: Avoid perception of the 
title. (Edward) 

268 state -do ctype: State the current 
document type. (Andreas, Edward, 
Harold, Inge, Marliese) 

269 title -informat ion: State the core 
bibliographic features of the 
document: title, author, document 
type, etc. (Edward, Hanne, Harold) 

4 prof A3: Holding information 
4 prof A31: The act of holding 
4 prof A31.1: Holding information 
without transformation 

2 70 hold: Hold a meaning unit in store. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 



271 hold-increment: Go gradually when 
adding a meaning unit to your 
store. (Edward, Hanne, Harold, 
Inge) 

4 prof A31.2: Writing down 
information (taking notes) 

272 ab-get: Obtain a written statement 
for an abstract. (Edward, Hanne, 
Harold) 

273 ab-increment: Go gradually when 
obtaining a written statement for 
an abstract. (Hanne, Harold) 

274 ab-stop: Stop abstracting 
statements from the original. 
(Hanne) 

4 prof A32: Relevance judgements 
4 prof A32.1: General 
strategies for relevance 
judgements 

275 fact-over-form: Content 
statements are more important 
than statements about document 
organization. (Edward, Marliese) 

276 interesting: State what is 
interesting. (Edward, Inge) 

277 interpretation: Project a statement 
from the document to an 
established concept of the dis- 
cipline. (Andreas, Hanne, Harold, 
Inge, Marliese) 

278 relevant: State what is relevant or 
important. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

279 relevant -say: Repeat an important 
statement in your own words. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

280 unimportant: State what is 
unimportant. (Marliese) 
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4 prof A32.2: Relevance 
judgements that consider the 
usability of information 
4 prof A32.21: Relevance 
judgements based on 
practical usability 

281 relev ant-biblio: The items of the 
bibliographic description are 
relevant. (Harold, Inge) 

282 relevant-catch: Catchwords are 
relevant if they express relevant 
concepts. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

283 relevant-in-scope: What lies 
within the scope of the 
information system is relevant. 
(Marliese) 

284 relevant-use: Information that can 
be used is relevant. (Andreas, 
Edward, Hanne, Marliese) 

4 prof A32.22: Relevance 
judgements based on information 
quality 

4 prof A 32.221: Relevance 
judgements based on informational 
reliability 

285 no -doubt: If in doubt, leave it out. 
(Edward, Hanne, Harold) 

286 no -publicity: Get rid of publicity. 
(Hanne, Inge, Marliese) 

287 understood-only: Leave out what 
you haven’t understood. (Harold, 
Inge, Marliese) 

4 prof A32.222: Relevance 
judgements based on information 
value 

288 no -truism: Leave out what is 
trivial. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

289 no -void: Leave out what lacks 
content. (Edward, Hanne, Harold, 
Inge, Marliese) 



290 own-only: No cited results, the 
authors’ own results only. (Hanne, 
Marliese) 

291 relevant-contrast: What stands out 
from the rest is important. (Harold, 
Marliese) 

292 relevant-new: What is new and 
original is important. (Edward, 
Hanne, Harold, Inge, Marliese) 

4 prof A32.23: Relevance 
judgements based on factual 

importance 

293 last-state: The last state of a 
historical development is 
important. (Hanne) 

294 no-reasons: Note the fact and not 
the reasons behind it. (Hanne) 

295 relev ant -by -fact-known: 

Determine the importance of an 
item using your own factual 
knowledge. (Andreas, Edward, 
Harold, Inge, Marliese) 

296 relevant-by-fact: Items that are of 
factual significance are important. 
(Andreas, Edward, Hanne, Harold) 

297 relevant-cited: Authors who are 
cited are important. (Edward, Inge) 

298 relevant-causal: In a causal 
(historical) development only 
reason and result are important. 
(Hanne) 

299 relevant-doc-feature: Document 
characteristics are relevant. 
(Edward) 

300 relev ant -fact: Factual information 
is relevant, other information 
types are not. (Andreas, Edward, 
Hanne, Inge, Marliese) 

301 relev ant -imp act: What has 
(practical) consequences is 
relevant. (Edward, Hanne) 

302 re lev ant -known: What is known to 
be important is relevant. (Edward) 
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303 relevant-result: The result is 
relevant. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

304 relev ant-sub stance: Chemical 
substances are relevant. (Edward) 

305 relevant-theory: Concepts that are 
important in theory are relevant. 
(Hanne) 

306 relevant-whole: The whole is 
relevant, not its parts. (Hanne, 
Inge, Marliese) 

4 prof A32:3: Relevance assignment 
based on indicators derived from 
text organization 
4 prof A32,31: Relevance 
assignment based on formal text 
organization 

307 relevant-content: What is to be 
found in the table of contents is 
relevant. (Hanne, Harold, Inge, 
Marliese) 

308 relevant -caption: What figures in 
captions of graphs and tables is 
relevant. (Edward) 

309 relev ant -heading: What is to be 
found in headings is relevant. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

310 relev ant -title: The document title 
is relevant. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

4 prof A32.32: Identifying relevant 
(central) text passages 

311 insist: A topic insisted on by the 
author is important. (Edward, 
Harold) 

312 positive: What is positively stated 
is important, negative statements 
are not. (Marliese) 

313 relevant-cover: Meaning items are 
important if they represent a large 
volume of material in the original. 
(Andreas, Edward, Hanne, Inge) 



314 relev ant -call: Meaning units are 
relevant that can be linked directly 
to the document topic. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

315 relevant-formhint: Meaning units 
are relevant that are highlighted by 
layout features. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

316 relevant-present: Features of 
presentation are important 
(drawings, program code, etc.). 
(Edward, Harold) 

317 relevant-scheme: Meaning units 
are relevant that belong to the 
document scheme or to other 
important schemata. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

318 relevant-summary: Ready-made 
sununaries from the original are 
relevant. (Andreas, Edward, Hanne, 
Harold, Inge) 

319 relev ant -texthint: Meaning units 
marked with positive author’s 
indicator phrases are important. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

320 relevant-topic-sentence: Topic 
sentences from the original are 
relevant. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

321 relevant-unit: What is to be found 
at the beginning of a text unit or at 
its end tends to be relevant. 
(Andreas, Edward, Hanne, Harold, 
Inge) 

4 prof A32.33: Identifyii^ 
irrelevant (marginal) text passages 

322 drop -texthint: Meaning units 
marked with negative author’s 
indicator phrases are unimportant. 
(Edward, Hanne, Marliese) 

323 no-citation: Do not cite. (Harold) 




282 



4 Professional Summarizing 



324 no -comment: Leave out 
explanations, comments, 
modifications, etc. (Edward, 

Hanne, Harold, Inge, Marliese) 

325 no -details: Detail is unimportant 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

326 no -examples: Examples are 
unimportant. (Edward, Hanne, 
Harold, Inge) 

327 no- generalities: General 
statements are unimportant. 
(Andreas, Harold, Marliese) 

328 no-marginalia: Leave out 
incidental information. (Hanne) 

4 prof B: Information presentation 
4 prof Bl: Abstracting 
4 prof Bll: Reworking of 
informational content 
4 prof Bll.l: Determinii^ the form 
of presentation 

329 data: Present data as data 
information. (Marliese) 

330 list-of -items: Present data in the 
form of lists. (Harold, Inge, 
Marliese) 

331 presentation: Find an efficient 
presentation for information. 
(Edward) 

332 text: Present information in text 
form. (Marliese) 

4 prof B11.2: Upgradii^ 
information quality 

333 assess -inform: Check the 
information value of an item. 
(Edward, Hanne) 

334 complete: Ensure information is 
complete. (Edward, Hanne, Harold, 
Inge) 

335 concrete: Give concrete and precise 
statements. (Andreas, Inge) 



336 detail-of -treatment: Characterize 
the detail the original treatment. 
(Harold, Marliese) 

337 fact-in: Include as many facts, data, 
names, etc., as possible. (Harold, 
Inge) 

338 logical: Ensure that the 
information in your abstract is 
logical. (Marliese) 

339 quantity: Give quantitative 
information. (Harold, Inge) 

340 responsible: Make clear who is 
responsible for an information 
item: the abstractor, the author, 
etc. (Edward, Harold) 

341 science: Stay scientific in your 
argumentation. (Hanne) 

342 source -of -information: Decide 
about your information sources. 
(Edward, Inge, Marhese) 

343 substance: Write an abstract with 
substance. (Inge) 

344 weigh: Weigh the information 
value of individual statements. 
(Edward) 

4 prof B11.3: Increasing the 

information density 

345 concise: Be as concise as possible 
without being unclear. (Hanne) 

346 condense: Find an equivalent 
formulation which is shorter. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

347 header: Represent a list by its 
header or header sentence. You may 
omit the list. (Andreas, Harold) 

348 once: Say it only once. (Andreas, 
Edward, Hanne, Harold, Marhese) 

349 range: Give a range instead of 
individual values. (Harold) 

350 reduce -list: Represent a hst by 
selected list items. (Andreas, 
Hanne) 
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351 state-important: Insist on stating 
important information, where 
necessary at the expense of less 
important items. (Marliese) 

4 prof B11.4: Improvii^ 

understandability 

352 add-inform: Add information that 
ensures understandability. 
(Andreas, Hanne, Harold, Marliese) 

353 clarity: Make sure you are as clear 
as possible. (Edward, Hanne, 
Harold, Inge, Marliese) 

354 definition: Define and/or explain 
important facts if necessary. 
(Edward, Hanne, Harold, Inge, 
Marliese) 

355 term: Use a term or expression that 
is better (more pertinent, to the 
point). (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

4 prof B11.5: Ensuring Hdelity 

356 author: Use the author’s own 
words. (Edward, Hanne, Harold, 
Marliese) 

357 fidelity: Convey what the original 
says, not your own opinions. 
(Edward) 

358 true: When changing the 
presentation form, make sure that 
the content remains intact. 
(Andreas, Edward, Inge, Marliese) 

4 prof B11.6: Upgrading 

information structure 

359 abstract-outline: Emphasize the 
abstract outline. (Inge) 

360 gist-of-document: In the abstract, 
make the document outline 
transparent. (Andreas, Harold, 

Inge, Marliese) 

361 level-out: Keep to a determined 
level of detail. (Andreas, Edward, 
Hanne, Harold, Marliese) 



4 prof B11.7: Implementing the 

communication function 

362 appeal: Produce an attractive 
abstract. (Inge) 

363 communication-goal: Determine 
your communication goal. 

(Edward) 

364 feeling: Convey the atmosphere 
and the tone of the original. 
(Harold) 

365 mediate: Mediate between the 
intentions of the author and the 
interests of the audience. (Inge) 

366 no -evaluation: Do not evaluate the 
document. (Harold, Inge) 

367 reader: Orient yourself towards 
your readers, their prior knowledge 
and interests. (Edward, Harold, 
Inge, Marliese) 

368 reader' s -knowledge: Determine the 
knowledge prerequisites that 
readers need. (Harold) 

369 refer: Refer the reader to the 
original or to more special 
abstracts. (Marliese) 

370 State important 
relations to other documents. 
(Andreas) 

4 prof B12.1: Controlling abstract 

construction 

371 con-stop: Stop abstract 
construction. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

372 construct: Construct an abstract 
passage. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

373 construct-increment: Construct an 
abstract passage incrementally. 
(Andreas, Edward, Hanne, Harold, 
Inge) 

3 74 expand: Start out from your notes 
and expand them from original 
passages. (Andreas, Edward, 

Hanne, Harold, Inge, Marliese) 
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375 outline: Structure your abstract 
using a standard outline if 
possible. (Hanne) 

376 outline-increment: Structure your 
abstract incrementally, (Edward, 
Hanne, Inge) 

377 start-abstract: Begin to write your 
abstract. (Edward, Hanne, Harold, 
Inge) 

4 prof B12.2: Determining the 

abstract structure 
4 prof B12.21: Elaborating the 

overall abstract structure 

378 abstract-type: Determine the 
current abstract type. (Harold) 

379 blueprint: Sketch a text unit in any 
suitable form, e.g., graphically. 
(Edward) 

380 con-introduction: Begin the 
abstract with an introductory 
sentence. (Marliese) 

381 con-scheme: Use general-purpose 
schemata for information 
organization. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

382 concentrate -on-title: Concentrate 
on the document title before 
writing the abstract, e.g., by 
reading it. (Edward, Hanne, Harold) 

383 final: Get a concluding sentence 
for the abstract. (Edward, Hanne) 

384 follow -summary: Base your 
abstract on an intext summary 
from the original. (Andreas, 
Edward, Inge) 

385 length: Respect the maximum 
abstract length restrictions. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

386 mimic: Re-use the original 
document structure for your 
abstract. (Andreas, Edward, Hanne, 
Marliese) 



387 topic -first: Begin the abstract with 
a topic sentence. (Andreas, Edward, 
Inge, Marliese) 

4 prof B 12.22: Producing 

individual statements 

388 construct -plan: Configure the 
content of a statement without 
considering its formulation. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

389 doc-audience: State the document 
audience in your abstract. (Harold) 

390 get-document-feature: Include 
important document features, 
especially the document type. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

391 get -feature: Include important 
features of the described object. 
(Edward, Hanne, Marliese) 

392 get-filler: Find content for a 
planned meaning component. 
(Andreas, Edward, Hanne, Harold) 

393 material: Include good material 
from the original in the abstract. 
(Andreas, Marliese) 

394 place: Find a suitable place for a 
statement. (Edward, Hanne, 
Marliese) 

395 role: Determine which role a text 
passage is to assume in the 
abstract. (Andreas, Harold, 
Marliese) 

396 topic: Construct a topic sentence. 
(Edward, Hanne) 

4 prof B12.3: Formulating 
4 prof B 12.31: Controlling the 
formulation 

397 form-inc rement: Formulate 
incrementally. (Andreas, Edward, 
Hanne, Harold, Inge) 
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398 formulation: Formulate a text 
passage. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

399 preform: Elaborate a formulation 
orally before writing it down. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

400 translate: Translate a text passage. 
(Andreas) 

401 translate -increment: Translate 
incrementally. (Andreas) 

4 prof B 12.32: Acquiring 

ready-made formulations 

402 acronym: Use well-known 
abbreviations. (Edward, Inge, 
Marliese) 

403 citation: Cite passages from the 
original. (Hanne, Harold, Inge) 

404 con-content: Obtain abstract 
passages from the table of 
contents. (Inge, Marliese) 

405 con-heading: Obtain abstract 
passages from headings of the 
original. (Andreas, Edward, Harold, 
Inge) 

406 con-title: Integrate the document 
title, or part of it, in the abstract. 
(Andreas, Edward, Inge) 

407 pattern: Use standard patterns of 
formulation. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

408 ready-made: Use ready-made text 
passages from the original. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

4 prof B 12.33: Integrating text 

modules 

409 balance: Give formulations and 
meaning components an 
acceptable balance. (Edward, Inge) 

410 connect: Connect individual 
statements to compose texts. 



(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

411 ref? r^a/iize; Reorganize text 
passages to make them fit in a new 
context. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

4 prof B 12.34: Elaborating style 

412 emphasize: Highlight important 
elements by rhetorical 
reinforcement. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

413 first-name: Include first names of 
persons. (Edward) 

414 no-rep: Avoid repetitions. 
(Edward, Hanne, Harold, Inge, 
Marliese) 

415 reinforce: Emphasize meaning 
structures by formal means, e.g., 
numbering. (Andreas, Edward) 

416 style: Write good style. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

4 prof B 12.35: Correct language 

use 

417 correct: Be correct in your 
language use. (Hanne, Inge, 
Marliese) 

418 orthograph: Use correct spelling. 
(Edward, Harold) 

419 pronounciation: Pronounce 
correctly. (Edward) 

420 punctuation: Be correct in your 
punctuation. (Andreas, Edward, 
Inge) 

421 sentence: Build a correct sentence. 
(Andreas, Marliese) 

422 temp us: Use tenses correctly. 
(Marliese) 

4 prof B 12.36 Writing typical 

abstract style 

423 active: Use active sentences where 
possible. (Hanne, Harold) 
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424 direct: Be as direct as possible in 
your expression. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

425 moderate: Be moderate in your 
statements. (Andreas, Edward, 
Hanne, Inge) 

426 no -clumsy: Avoid clumsy 
constructions or units. (Marliese) 

427 nominal: Nominalize expressions 
for use in lists. (Andreas) 

428 precise: Express yourself as 
precisely as possible. (Edward, 
Harold, Marliese) 

429 present: Use the present tense 
where possible. (Harold) 

430 short-sentence: Build short 
sentences. (Andreas, Edward, 
Hanne, Harold, Marliese) 

431 third-person: Use the third person 
in your statements. (Edward) 

4 prof B2: Indexii^ and classifying 
4 prof B21: General strategies of 

indexing and classifying 

432 control-vocab: Use a controlled 
vocabulary for information 
presentation. (Andreas, Inge) 

433 cover: Cover important 
information units (e.g., chapters) 
by a descriptor or a classification. 
(Andreas, Hanne) 

434 database: Adapt indexation 
classification to the target 
database. (Marliese) 

435 distribute: Express yourself using 
the most appropriate presentation 
means. (Andreas, Edward) 

436 ind-environ: Use information from 
the environment of a document to 
find out about the document theme. 
(Hanne, Harold, Marliese) 

437 limited-expression: Expect 
indexing languages and 
classification systems to be 



limited in their expressive power. 
(Edward, Inge) 

438 up-to-date: Ensure that your target 
vocabulary is at least as up-to-date 
as your document. (Inge) 

4 prof B22: Indexing 
4 prof B22.1: Controlling the 
indexing process 

439 get-index-term: Form a free 
indexing term. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

440 ind-increment: Form a descriptor 
or a combination of descriptors 
incrementally. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

441 ind-stop: Stop an indexing 
process. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

442 index: Represent a central 
document concept by a descriptor 
or a combination of descriptors. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

443 start -indexing: Start indexing. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

4 prof B22.2: Performing indexing 
4 prof B22.21: Determining 
indexing concepts using the 
document structure 

444 ind-abstract: Use concepts from 
the abstract for indexing. (Harold) 

445 ind-bookindex: Draw indexing 
concepts from the back-of-the- 
book index or comparable 
instruments. (Edward, Harold) 

446 ind-caption: Draw indexing 
concepts from captions. (Edward) 

447 ind-heading: Draw indexing 
concepts from headings. (Edward, 
Hanne, Harold) 

448 ind-outline: Draw indexing 
concepts from the document 
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outline. (Edward, Hanne, Inge, 
Marliese) 

449 ind- title: Draw indexing concepts 
from the title. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

4 prof B22.22: Determining 

indexing concepts using content 

criteria 

450 ind-fact: Index essential facts. 
(Andreas, Edward, Harold) 

451 ind-identifier: Index important 
names, e.g., systems. (Edward, 
Harold) 

452 ind-theme: Index the document 
theme. (Andreas, Edward, Hanne, 
Harold, Inge, Marliese) 

453 ind-volume: Assign a descriptor 
only if it represents sufficient 
material in the document. (Edward, 
Hanne, Harold, Marliese) 

4 prof B22.23: Elaborating indexing 

concepts 

454 ind-ancdysis: Analyze the 
document theme using thesaurus 
concepts and relations. (Edward, 
Inge) 

455 ind-basis: Set up a sufficient 
information base for indexing. 
(Inge) 

456 ind- choice: Develop a choice of 
indexing terms or descriptors. 
(Edward, Hanne, Marliese) 

457 ind-defmed: Use only well-defined 
concepts for indexing. (Edward, 
Harold) 

458 ind-plan: Determine an indexing 
concept without considering its 
formulation. (Edward, Hanne, 
Harold, Inge) 

459 ind-restrict: Restrict yourself to 
important concepts, do not index 
others. (Andreas, Edward, Hanne, 
Harold) 



460 ind-standard: Form a standard 
indexing concept. (Edward, Harold, 
Inge) 

4 prof B22.24: Technical 

elaboration of the indexation 

461 ind-best: Use the descriptor that 
fits best. (Andreas, Edward, Hanne, 
Inge, Marliese) 

462 ind-bound: Use a general indexing 
concept only in combination with 
a normal descriptor. (Hanne, 
Harold) 

463 ind-broader: Use a broader term for 
indexing. (Andreas, Edward, 

Hanne, Harold, Inge, Marliese) 

464 ind-combi: Combine descriptors to 
render a concept, or use pre- 
combined descriptors from the 
thesaurus. (Edward, Hanne, Inge, 
Marliese) 

465 ind- depth: Obtain a reasonable 
indexing depth. (Hanne, Harold) 

466 ind- guide: Index to guide the right 
readers to the right document. 
(Edward, Harold, Marhese) 

467 ind-index: Link suitable pairs of 
descriptors with an index. (Hanne, 
Marliese) 

468 ind-link: Determine relations 
between indexing concepts. 
(Edward, Inge) 

469 ind-precise: Present the object of 
description as precisely as 
possible in indexing. (Andreas, 
Edward, Hanne, Harold) 

470 Index safely, add a 
descriptor if it enhances retrieval 
safety. (Edward, Hanne, Harold) 

471 ind-specific: Use specific instead 
of more general indexing 
concepts. (Edward, Hanne, Harold) 

472 main-desc: Mark main descriptors. 
(Andreas, Edward, Marliese) 
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473 qualifier: Mark descriptors as 
qualifiers. (Marliese) 

474 second-desc: Mark secondary 
descriptors. (Edward) 

4 prof B22.25: Using the thesaurus 

475 desc-status: Determine the status of 
a descriptor before using it. 

(Hanne) 

476 descriptor: Check if an expression 
is a descriptor before using it. 
(Andreas, Edward, Hanne, Inge) 

477 ind-drop: Avoid concepts which 
cannot be represented with the 
thesaurus, or which cause dif- 
ficulties. (Andreas, Edward, 
Marliese) 

478 ind-follow: Follow thesaurus 
relations to find suitable 
descriptors. (Andreas, Edward, 
Hanne, Inge) 

479 ind-get: Use the thesaurus to 
obtain descriptor candidates. 
(Andreas, Edward, Hanne, Inge) 

480 ind-rules: Stick to the presentation 
rules of the thesaurus. (Edward, 
Inge) 

481 ind-sem: Use thesaurus 
information (scope notes, 
relations, etc.) to understand the 
meaning of a descriptor as 
precisely as possible. (Andreas, 
Edward, Hanne, Marliese) 

482 ind~target: Find indexing concepts 
which can be expressed with the 
target vocabulary. (Edward, Harold) 

483 ind~token: Check the alphabetical 
environment of a descriptor for 
appropriate descriptor candidates. 
(Andreas, Edward, Hanne) 

484 ind-w anted: Propose a new 
descriptor. (Edward) 



4 prof B23: Indexing with a natural 
language phrase 

485 phrase: Form a phrase to 
characterize the document. 
(Andreas) 

486 phrase -con: Use the permitted 
connectors in the phrase. 

(Andreas) 

487 phrase-fact: Include important 
facts in the phrase. (Andreas) 

488 phrase-ind: Use elements of your 
indexation in the phrase. (Andreas) 

489 phrase -theme: Express the 
document theme in the phrase. 
(Andreas) 

490 phrase-top: Express topic 
sentences of the document in the 
phrase. (Andreas) 

4 prof B24: Classifying 
4 prof B24.1: Controlling 
classification 

491 class -increment: Elaborate 
classification notations 
incrementally. (Edward, Inge) 

492 class-stop: Stop classifying. 
(Edward, Hanne, Inge) 

493 classify: Classify a document. 
(Andreas, Edward, Hanne, Inge, 
Marliese) 

494 start-classify: Begin to classify. 
(Andreas, Edward, Inge) 

4 prof B24.2: Performing 
classification 

4 prof B24.21: Determining 
classification concepts using the 
document structure 

495 class-outline: Draw classification 
concepts from the document 
outline. (Inge) 

496 class -title: Draw classification 
concepts from the title. (Andreas, 
Inge, Marliese) 
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4 prof B24.22: Determining 
classification concepts accordii^ to 
content criteria 

497 class -theme: Express the document 
theme by a classification notation. 
(Andreas, Inge) 

498 discipline: Classify an object of 
description in a discipline. 
(Andreas, Edward, Hanne, 

Marliese) 

499 specify-class: Determine the target 
class. (Edward) 

4 prof B24.23: Elaborating 
classification concepts 

500 class -plan: Configure the content 
of a classification notation, 
disregarding the notational form. 
(Inge) 

501 class -standard: Find a standard 
formulation before looking for a 
classification notation. (Inge) 

4 prof B24.24: Technical 
elaboration of the classification 

502 class -best: Use the class that fits 
best. (Andreas, Edward, Hanne, 
Inge) 

503 class-choice: Look at a choice of 
classes to determine the most 
suitable one. (Edward, Hanne) 

504 class-combi: Build a combined 
notation. (Edward, Inge) 

505 main-class: Determine the main 
class. (Hanne) 

4 prof B24.25: Usin^ the 
classification system 

506 class-location: Find out where in 
the system a class is located. 

(Inge) 

507 class -rules: Follow the 
presentation rules of the 
classification system. (Edward, 
Inge) 



508 class-sem: Consider the semantic 
relations of the classification 
systems, also implicit ones. (Inge) 

4 prof B3: Bibliographic 
description 

4 prof B31: Controlling 
bibliographic description 

509 biblio: Determine the data of 
bibliographic description. 
(Andreas, Edward, Inge, Marhese) 

510 pro -biblio: Produce the 
bibliographic description. (Inge) 

4 prof B32: Performii^ 
bibliographic description 

511 biblio -relation: Note central 
bibliographic relations to other 
documents. (Andreas) 

512 edition: State publisher, place, and 
year of publication. (Inge, 
Marliese) 

513 erstautor: State the first author. 
(Andreas) 

514 get-author: Determine the author. 
(Inge) 

515 isbn: Determine the ISBN. (Inge) 

516 journal: Determine the journal by 
name, volume, issue and pages. 
Analogously for collective works 
and proceedings. (Andreas, Inge, 
Marliese) 

517 number: Note the control number. 
(Andreas) 

518 pages: Note the pages where the 
document appears. (Andreas, Inge) 

519 series: Determine the series. 
(Marliese) 

520 subtitle: Determine the subtitle. 
(Marliese) 

521 title: Determine the title. (Andreas, 
Inge, Marliese) 
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4 prof B4: Revisii^ 

4 prof B41: Controllii^ revision 

522 rev -increment: Revise 
incrementally. (Andreas, Edward, 
Hanne, Harold, Inge, Marliese) 

523 rev-stop: Stop revising. (Andreas, 
Edward, Hanne, Harold, Inge, 
Marliese) 

524 revise: Revise your product. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

525 start-revision: Begin to revise. 
(Andreas, Edward, Hanne, Harold, 
Inge, Marliese) 

4 prof B42: Revising specific 
products 

4 prof B42.1: Revising an abstract 
4 prof B42.il: Improving the 
language of the abstract 

526 rev -after: Check for text coherence 
after cancelling a passage. 
(Andreas, Edward, Marliese) 

527 rev -coherence: Improve textual 
coherence. (Hanne) 

528 rev -flow: Improve sentence flow. 
(Edward) 

529 rev-language-use: Improve 
correctness of language use. 
(Edward) 

530 rev -legible: Improve legibility of 
your text. (Andreas, Edward) 

531 rev -ortho: Improve your spelling. 
(Inge) 

532 rev-sequence: Check and improve 
the sequence of the abstract 
statements. (Edward, Harold, 
Marliese) 

533 rev-style: Improve the style of the 
abstract. (Andreas, Edward, Hanne, 
Inge, Marliese) 

534 rev-tempus: Improve tense use. 
(Marliese) 



4 prof B42.12: Improvii^ the 

information quality of the abstract 

535 rev -clear: Improve 
understandability. (Andreas, 
Edward, Hanne, Harold, Marliese) 

536 rev -complete: Improve the 
completeness of the abstract. 
(Edward, Hanne, Harold, Marliese) 

537 rev-con-check: Check if the 
abstract fits the text theme by 
comparing the title and the last 
abstract sentence. (Hanne) 

538 rev-concise: Improve concision 
and density. (Harold, Marliese) 

539 rev -contour: Make the information 
organization more transparent. 
(Harold, Marliese) 

540 rev-correct: Improve matter-of-fact 
correctness. (Edward, Hanne, 
Harold, Inge, Marliese) 

541 rev -length: Adapt abstract length 
(to the norm). (Andreas, Hanne) 

542 rev -precise: Improve the precision 
of expression. (Edward, Marliese) 

543 rev -principle: Check adherence to 
abstracting principles. (Marliese) 

4 prof B42.2: Revising indexations 

544 rev-ind-correct: Improve the 
formal correctness of the 
indexation. (Hanne) 

545 rev -ind- content: Improve the 
factual correctness of the 
indexation. (Andreas, Edward, 
Hanne, Harold) 

546 rev-ind-complete: Improve the 
completeness of the indexation. 
(Andreas, Edward, Hanne, Harold) 

547 rev-ind-index: Check which 
descriptors have been used. 

(Hanne) 

548 rev-ind-reduce: Try to cancel 
descriptors. (Hanne) 

549 rev-index: Revise your indexation. 
(Andreas, Edward, Hanne, Harold) 
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4 prof B42.3: Revising a 
classification 

550 rev-class: Revise a classification. 
(Andreas) 

551 rev-class -by-thes: Revise a 
classification with the help of the 
thesaurus. (Andreas) 



552 rev -class-content: Improve the 

factual correctness of a 
classification. (Andreas) 



4.4.2 Alphabetical index of strategies 




ab-get 272 


biblio-relation 511 


combine 112 


ab-increment 273 


blocked 8 


comfort 9 


ab-stop 274 


blueprint 379 


communication-goal 363 


abbrev 129 


box 165 


comp-source 217 


abort 66 


bridge 205 


compare 113 


abstract-outline 359 


browse 236 


complete 334 


abstract-type 378 


by-form 190 


con-content 404 


acronym 402 




con-heading 405 


activate 107 


calculate 134 


con-introduction 380 


active 423 


cancel 166 


con-scheme 381 


adapt 20 


capitals 155 


con-stop 371 


add-inform 352 


check 85 


con-title 406 


aggregate 108 


check-inform 111 


concentrate-on- title 382 


alternative 46 


check-plan 47 


concentration 10 


annotate 154 


cheer 13 


concise 345 


answer 196 


circle 167 


conclusion 221 


appeal 362 


citation 403 


concrete 335 


apply 109 


clarity 353 


condense 346 


argument 208 


class-best 502 


confirm 197 


arrow 163 


class-choice 503 


connect 410 


aside 40 


class-combi 504 


consequent 22 


asses s-inform 333 


class-increment 491 


construct 372 


asterisk 164 


class-location 506 


construct-increment 373 


audience 41 


class-outline 495 


construct-plan 388 


author 356 


class-plan 500 


content 264 


author’ s-voice 209 


class-rules 507 


continue 67 


avoid 21 


class-sem 508 


contour 198 




class-standard 501 


control- vocab 432 


background 110 


class-stop 492 


core-in-abstract 222 


backpoint 93 


class-theme 497 


correct 417 


balance 409 


class-title 496 


count 114 


better 216 


classify 493 


course 191 


biblio 509 


collect 79 


cover 433 
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critique 98 
cross 168 
cross-out 169 
current 1 

current-position 184 

data 329 
database 434 
decide 62 
define-filter 231 
define-task 48 
definition 354 
degree 130 
desc-status 475 
descriptor 476 
detail-of-treatment 336 
determined 14 
direct 424 
discipline 498 
discussion 223 
distribute 435 
disturbed 34 
doc-audience 389 
document-feature 179 
done 68 

drop-texthint 322 
dummy 115 

earmark 80 
easy 49 
economy 87 
edition 512 
elaborate 206 
emphasize 412 
enough 88 
error 99 
erstautor 513 
estimate-number 116 
estimate-problem 50 
estimate-task 51 
evaluate 117 
ex-form 260 
example 237 
exclude 81 



expand 374 
experience 118 
experiment 35 
explain 52 
explore 232 
expression 2 

fact-in 337 

fact-over-form 275 

failure 100 

feasible 23 

feeling 364 

fidelity 357 

final 383 

first 256 

first-last 257 

first-name 413 

fit 11 

flaw 180 

follow 192 

follow-summary 384 

footnote 265 

foresee 53 

form-increment 397 

form-reflects-content 224 

formula 225 

formulation 398 

generalization 135 
generate-and-evaluate 54 
get- author 514 
get-document-feature 390 
get-feature 391 
get-filler 392 
get-index-term 439 
get-overview 185 
gist-of-document 360 
goal 55 

head-in-text 226 
header 347 
heading 266 
hint 186 
hold 270 



hold-increment 271 
hope 254 
hypothesis 119 

idea 138 
ignore 15 
image 238 
imagine 139 

imagine-and-evaluate 140 
improve 24 
increment 76 
ind-abstract 444 
ind-analysis 454 
ind-basis 455 
ind-best 461 
ind-bookindex 445 
ind-bound 462 
ind-broader 463 
ind-caption 446 
ind-choice 456 
ind-combi 464 
ind-defined 457 
ind -depth 465 
ind-drop 477 
ind-environ 436 
ind-fact 450 
ind-follow 478 
ind-get 479 
ind-guide 466 
ind-heading 447 
ind-identifier 451 
ind-increment 440 
ind-index467 
ind-link 468 
ind-outline 448 
ind-plan 458 
ind-precise 469 
ind-restrict 459 
ind-rules 480 
ind-safe 470 
ind-sem 481 
ind-specific 471 
ind- standard 460 
ind-stop 441 
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ind-target 482 
ind-tbeme 452 
ind-title 449 
ind-token 483 
ind- volume 453 
ind- wanted 484 
index 442 
inference 136 
insert 156 
insist 311 
interest 56 
interesting 276 
interpretation 277 
interrupt 36 
intertext 212 
isbn 515 

issue-of-method 25 

journal 516 
jump 193 
justify 63 

know 131 
knowledge 132 

label 199 

language 42 

last 258 

last-check 86 

last-state 293 

last-try 101 

later 105 

layout 177 

length 385 

level-in 261 

level-out 361 

limited-expression 437 

limits 26 

link 170 

list-of-items 330 
location 187 
log-in 82 
log-out 83 
logical 338 



look-ahead 57 
look-up 120 

macro sentence 200 
main-class 505 
main-desc 472 
main-point 201 
mark 171 
mark-list 178 
mark-stop 172 
marked 252 
material 393 
means-ends 89 
mediate 365 
memo 157 
memorize 121 
memory 122 
method 27 
mimic 386 
missing 181 
moderate 425 
monitor 12 
more 90 
motive 210 
multiple-use 218 

next 69 
next-stage 70 
nice 43 

no -citation 323 
no-clumsy 426 
no-comment 324 
no-details 325 
no-doubt 285 
no-evaluation 366 
no-examples 326 
no -generalities 327 
no-marginalia 328 
no-op 71 
no-publicity 286 
no-reasons 294 
no-rep 414 
no- title 267 
no-truism 288 



no-void 289 
nominal 427 
number 517 

omission 102 
once 348 
once-in 219 
open 239 
open-point 58 
open-question 133 
opinion 3 
order 28 
orient 188 
ortho graph 418 
outline 375 

outline-increment 376 
overview 240 
own -only 290 
own-way 16 
own-words 123 

pages 518 
parallel 77 
pass 194 
pattern 407 
personal 4 
phrase 485 
phrase-con 486 
phrase-fact 487 
phrase-ind 488 
phrase-theme 489 
phrase-top 490 
physical 44 
place 394 
plan 59 
polite 45 
positive 312 
pragmatic 213 
precedent 29 
precise 428 
precond 72 
preform 399 
prepare-decision 64 
preps 37 
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present 429 
presentation 331 
pro-biblio 510 
production 84 
pronounciation 419 
proof 241 
punctuation 420 

qualifier 473 
quantity 339 
question 202 

range 349 
read 142 
read-back 143 
read-fast 144 
read-find 145 
read-form 146 
read-free 147 
read-image 148 
read-on-demand 149 
read-open 150 
read-over 151 
read-param 152 
read-proof 153 
reader 367 

reader’ s-knowledge 368 
ready-made 408 
reason 137 
reduce-list 350 
redundant 220 
refer 369 
references 214 
refresh 94 
reinforce 415 
related-doc 370 
relevant 278 
relevant-biblio 281 
relevant-by-fact 296 
relevant-by-fact-known 
295 

relevant-call 314 
relevant-caption 308 
relevant-catch 282 



relevant-causal 298 
relevant-cited 297 
relevant-content 307 
relevant-contrast 291 
relevant-cover 313 
relevant-doc-feature 299 
relevant-fact 300 
relev an t-formhint 315 
relevant-heading 309 
relevant-impact 301 
relevant-in-scope 283 
relevant-known 302 
relevant-new 292 
relevant-present 316 
relevant-result 303 
relevant-say 279 
relevant-scheme 317 
relevant-substance 304 
relevant-summary 318 
relevant-texthint 319 
relevant-theory 305 
relevant-title 310 
relevant-topic-sentence 320 
relevant-unit 321 
relevant-use 284 
relevant-whole 306 
reorganize 411 
repeat 95 
reset 96 

responsible 340 
retract 103 
retrieve 242 
return 97 

return-to-source 182 
rev-after 526 
rev-class 550 
rev-class-by-thes 551 
rev-class-content 552 
rev-clear 535 
rev-coherence 527 
rev-complete 536 
rev-con-check 537 
rev-concise 538 
rev-contour 539 



rev-correct 540 
rev-flow 528 
rev-increment 522 
rev-ind-complete 546 
rev-ind-content 545 
rev-ind-correct 544 
rev-ind-index 547 
rev-ind-reduce 548 
rev-index 549 
rev-language-use 529 
rev-legible 530 
rev-length 541 
rev-ortho 531 
rev-precise 542 
rev-principle 543 
rev-sequence 532 
rev-stop 523 
rev-style 533 
rev-tempus 534 
revise 524 
role 395 
roter-faden 203 
rules 30 

sample 243 
science 341 
search 244 
second-desc 474 
select 245 
self 5 

sentence 421 
sentiment 6 
sequence-number 158 
series 519 
shallow 246 
short-sentence 430 
shorthand 159 
signature 160 
single-out 173 
skim 253 
skip 195 

source -of -information 342 
sources 215 
space 174 
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special-problems 60 
specify-class 499 
speed-up 78 
standpoint 7 
start 73 

start-abstract 377 
start-classify 494 
start-decision 65 
start-explore 233 
start-indexing 443 
start-revision 525 
start- writing 161 
state-doctype 268 
state -important 351 
state -principle 31 
state-problem 124 
state-result 125 
stop 74 

stop -explore 234 
stroll-along 17 
structure 247 
style 416 
subquestion 126 
substance 343 
subtitle 520 
suggestion 141 
sum-in-text 227 



sum-up 127 
suspend 75 
switch 248 

table 249 
task 18 
tech 38 
tempus 422 
tentative 32 
term 355 
test-index 230 
text 332 
text-image 228 
text-level 262 
texthint 255 
think-yourself 207 
third-person 431 
thorough 33 
tick 175 
time-limit 92 
time 91 
title 521 

title-information 269 
tools 39 
top-level 263 
topic 396 
topic-feature 204 



topic-first 387 
topic-sentences 259 
translate 400 
translate-increment 401 
trial-and-error 128 
true 358 
try-again 19 

underline 176 
understood-only 287 
unimportant 280 
unit 250 
unsolved 104 
up-to-date 438 
use-doctype 229 
use-index 189 

volume 183 
wait-and-see 106 
weigh 344 
who-author 211 
word 25 1 
working -plan 61 
write 162 
zoom 235 




5 Computational Approaches 



5.1 Introduction 



5.1.1 Computerized summarization presupposes 
a computerized situation 




Fig. 5.1. Computerized situated summarizing 



We have so far considered summarization and communication in communica- 
tion situations which from a technical point of view were very simple. A typi- 
cal mass communication situation was mentioned, where in contrast to face-to- 
face communication, the communicators do not have to be in the same place 
at the same time in order to communicate. However, the television channels, 
satellite dishes, telephones, and other telecommunications devices, all the 
technical equipment necessary to make spatiotemporally displaced commu- 
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nication possible, remained in the background. For the time being, we assumed 
that technical communication media do not affect the communication of 
contents in mass communication situations. It makes no difference whether the 
summary of the week’s stock exchange developments is transmitted via 
satellite or the telephone network. However, it does make a difference whether 
a summary is produced for television or radio, because the communication 
medium determines the presentation possibilities, and it does make a differ- 
ence whether a summary has to comply with the constraints of an organized 
professional information environment or not. 

If we turn our attention to computerized summarizing, we cannot ignore the 
technical conditions underlying the communication. Computer systems form 
part of the summarization situation and help to determine it. There are many 
areas of life where summarizing takes place that are out of the question for 
computerized summarizing. It is hard to imagine that a little girl in kindergar- 
ten uses a computer when she summarizes the story of Red Riding Hood, as it 
is likely to be some time before she acquires the skills needed to use word pro- 
cessing systems. Computer-assisted communication and summarization in a 
beer parlor is also at the very least highly unusual. Apart from kindergarten, 
other areas of life where literacy in the traditional sense and information tech- 
nology play only a minor role can be eliminated as environments for computer- 
ized summarizing. 

Figure 5.1 characterizes a computerized summarizing environment. It differs 
from the more general representations of the communication and summariza- 
tion situation in earlier chapters in two points: the summarizer may now be a 
computer system and the summary is stored on a computerized medium. It thus 
proposes two forms of computerization which complement each other: 

1. The summarizing communicator is a human being, who uses a computer as 
a tool. The texts to be summarized are received in various forms, on print 
media or as machine-readable full texts. The summaries are perhaps written 
with a word processing system and stored in a data base. They can be 
called up on the screen. Only the few summaries that are destined for use 
are printed out. 

2. A computer program can assume the role of the summarizer. For this, the 
full text must be available in machine-readable form, in other words the 
environment must already be highly computerized. Hence, summaries pro- 
duced by automatic systems are often found in professional contexts. They 
have their place, for example, in information systems, where large numbers 
of documents have to be represented in a shortened form. 

In professional contexts, it goes without saying that mostly specialized texts 
are summarized. Research into automated summarizing is focused on 

• scientific and technical documents in the context of specialized informa- 
tion systems 
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• narrative texts, in particular news texts, in the application context of press 
data bases. 

In professional environments, we find special forms of summarizing such as in- 
dexing and information extraction. They may be easier to automate than full- 
scale summarizing. 

Once the summarizing system is assigned the same role in the communica- 
tion process as a human communicator, a natural way of comparing the per- 
formance of the two emerges. It is possible to evaluate the results of both hu- 
man experts and computer systems, assess the intellectual effort that humans 
and systems invest, and determine the reliability and speed with which humans 
and systems can produce summaries. It may well be that at least for traditional 
summarizing tasks, today’s computer systems do not perform anything like as 
well as human summarizers, because humans know much more and can think 
faster and more flexibly. Computer systems are of interest in areas where they 
perform adequately and where human summarizers cannot cope with the 
amount of material to be summarized. What is important is that the products of 
summarization systems must be usable in a human working environment. With- 
out the tie to human sununarizing and the utility value that summaries have for 
humans, it will not be possible to develop summarization systems to the 
application stage. 

Computers have only relatively recently become interesting as a medium. As 
long as the presentation quality remained a weakness of computerized infor- 
mation products, it was not worth talking about the media characteristics of a 
computer in a communication situation. The arrival of interactive systems, hy- 
permedia, and multimedia frees automatic summaries from the constraint that 
imposed on all summaries the format of written language text. Both computer- 
ized documents and their summaries may use multimedia presentation, includ- 
ing images or even a sound track. Since media (image, text, graphics, sound, 
etc.) can be chosen deliberately to transmit the information content as clearly 
and concretely as possible, summaries, e.g., scientific abstracts, may change 
their appearance dramatically. It is easy to see how the individual stages of 
progress in computerization add up: 

• Enhanced quality of the medium. A text system is a writing medium with to- 
tally different and in many ways better characteristics than paper. For ex- 
ample, it is easier to make corrections. 

• Better storage and retrieval A data base system is a retrieval medium that 
offers better storage and retrieval possibilities than a conventional index 
card system. If summaries are stored in an information system, they can be 
retrieved according to different criteria. Abstracts assume an additional 
function as the basis for retrieving the full text and are organized accord- 
ingly. 
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• Multimedia presentation. A multimedia system incorporating text, image, 
sound, and direct manipulation offers more presentation possibilities than 
classic print media or audiovisual media such as motion pictures with 
sound. Authors have more registers for expressing themselves and audi- 
ences can interpret more coordinated messages when they acquire for 
themselves the information they require from the author. Whereas in the 
past, the stage with all its accompanying technical paraphernalia was the 
place for presenting multimedia works of art, now comparable effects on 
the computer screen can help to facilitate communication in day-to-day 
professional contexts. For example, summaries may contain atom models 
that the user can rotate, or may simulate welding techniques. 

• Active presentation. A computer system is not only passively scanned by 
an active user but actively does something with the information it holds. It 
checks spelling, or rotates the atom model before the user’s eyes and thus 
surpasses the utility of traditional media because it performs processing as 
well as presentational functions. 

• Adaptivity. Whereas classic print and audiovisual media have an inflexible 
presentation form, computer media can behave interactively and adaptive- 
ly. They react to user input, for example with respect to navigation within 
the medium or by changing the presentation form. They adapt to the 
individual user, by reconstructing the system level (s)he was on whenever 
(s)he starts up the system or by behaving differently according to the user 
profile, which may, for example, contain information about the user’s area 
of specialization. Even though in practice the interactive capabilities of 
systems are still limited, nevertheless the possibility of interacting with 
the user to design a retrieval session, a film, a novel, or a summary is im- 
portant. 

• Computer nets and increased information volume. With the ongoing devel- 
opment of computer nets such as the Internet, information volumes at indi- 
vidual workplaces are multiplying. The wealth of computerized informa- 
tion reinforces the demand for summarization functions which are fit for 
routine use. Functions are needed that give brief accounts of the informa- 
tion relevant to a certain question, or indexes which help to locate directly 
the places where pertinent online information can be found. In the past, re- 
searchers could argue that it would be nice and perhaps more practical in 
some not too distant future to have automatic summarizing functions. 
Since the computer network expansion of the 1990s, there is an evident 
need on the part of individual users to have knowledge available from the 
net which is predigested to summary length, and no alternative solution 
competes with automatic summarizing approaches. Thus developers of 
summarization systems are faced with real application needs. 
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Multimedia presentation is currently the most conspicuous advance in com- 
puter communication to influence summarizing, but by no means the only one: 
in particular, interactive systems invite us to conceive summarization as an 
interactive process instead of a simple display of prefabricated sununaries. The 
consequences for the design of interactive summarization systems are as 
evident as for systems that would tailor summaries to particular uses: the sys- 
tems need more knowledge about what happens during summarization and 
about the use of summaries. System designers must know about the summari- 
zation situation into which they introduce a computerized component. Sum- 
mary users and other persons involved directly or indirectly in summarization 
processes, such as summarizers or authors, find their situation dramatically 
changing in a computerized environment, to a degree that makes a continuous 
reassessment necessary. 

As technical developments free summarization from presentational restric- 
tions, human and computerized summarizers need more cognitive capacity to 
cope with the additional degrees of freedom. In this way, technical progress 
presses for a more articulated and penetrating view of automatic summariza- 
tion, and at the same time reduces the conceptual distance between automatic 
and human summarizing. Cognitive approaches have more impact on system 
design than in the past. After all, competent human summarizers have the most 
elaborate summarization competence available, and more intelligent sum- 
marization concepts can be obtained naturally by studying human skills. 
Where earlier researchers in automatic summarization saw no chance to learn 
anything implementable from the overly complex skills of human summarizers, 
their colleagues of today can define automatic summarizing with respect to 
human performance. 



5.1.2 Overview 

How we wind our path through the landscape of systems for automatic summa- 
rizing is a matter of taste. A presentation which traces the development of re- 
search has the advantage of dealing first of all with the early, particularly 
straightforward systems. Only after looking at them do we deal with the more 
sophisticated later approaches. The possible objection that the early summariz- 
ing systems are only of historical interest does not apply for two reasons: 

• Many ideas from the early days of system development still hold true to- 
day. This applies among other things to the concentration of the summariz- 
ing problem on the selection of statements that are important enough to be 
included in the summary, or to the methods for researching summarizing, 
which were developed at an early stage. 
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• Summarization systems whose scientific basis goes back to pioneering 
days are most advanced toward practical application. They therefore have 
a good chance, after some recycling, of being used in practice. 

In the following, we distinguish between systems of the early years, systems 
after the advent of cognitive science, and the systems of the 1990s which con- 
sider full-text bases and multimedia environments: 

• The first information systems appeared in an environment where scientifi- 
cally trained and hence stress-resilient users defied the inclemencies of 
the punched card era with batch processing, difficult-to-read printouts, and 
other handicaps, and at an early date accepted information from electronic 
systems. It was here that the idea of satisfying an information need with 
automatically generated abstracts was born. The first step was taken in 
1958 by H.P. Luhn, who proposed a system for the creation of computer 
abstracts. 

• From around 1980, the first cognitive science-oriented systems appeared. 
In the way they work they look towards human summarizing, especially 
the summarization of stories. Their applicational background was elec- 
tronic news distribution by agencies. News texts predominantly describe 
actions, therefore summarization concepts were adapted from narrative 
texts. 

• Around 1990, the demand for summarizing techniques begins to increase 
because of the transition to large-scale data bases and to computer net- 
works with bigger and bigger information volumes. With the advent of 
multimedia systems, the summarizing systems depart from their depend- 
ence on the written language. The first systems to integrate media are pro- 
posed. 



5.2 Early approaches: The creation of computer abstracts 
by sentence extraction 



As early as 1958, Luhn envisaged “the creation of computer abstracts” as an 
overall task and suggested the extraction method that is still used today, if in a 
more sophisticated way: the important statements are extracted from the 
original document and put together in a short text, the abstract or summary. 
This defined the criteria and direction for subsequent research. 

The first systems for automatic abstracting (the commonest professional form 
of summarizing) were motivated by practical considerations. The systems were 
to produce short versions from technical and scientific documents. Since the 
main concern was to produce anything that vaguely resembled an abstract, the 
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criteria for success and the methods used were at first simple. In principle, 
suitable sentences were extracted from the source text and copied into the ex- 
tract or abstract. It would have been foreign to the researchers at that time to 
place their systems in a summarization situation. In the pioneering days of 
automatic abstracting, the scene was determined more by technical difficul- 
ties. Both processing and memory capacities were limited. The programming 
languages were conceived for numerical data processing and could only be 
adapted with difficulty to the processing of character strings. Monitors as a 
means of interaction between user and program were not yet available. The 
deck of cards acted as intermediary between programmer and computer. 

To appreciate the contribution early research made to automatic abstracting, 
it is not enough merely to recall the technical working situation of the time. 
The technical pioneering situation was akin to the theoretical pioneering situa- 
tion: little was known about what happens during summarizing. All that existed 
were global opinions based on practical experience, which are nowhere near 
sufficient for realizing systems. The aim was, therefore, to draw up an initial 
theory about what happens during abstracting using the presentation possibili- 
ties of information processing machines available at the time. From today’s 
standpoint we may criticize that the first system realizations, as theories of 
abstracting, only scratched the surface. In this point, however, they do not dif- 
fer from the descriptions in contemporary textbooks. 

Attempts to formulate empirical theories were not long in coming. The be- 
havior of abstracters was studied, in particular the consistency of their deci- 
sions, and efforts were made to find methods for evaluating abstracts, because 
in order to be able to generate better abstracts, quality standards were needed. 
The problem in summarizing was formulated in general terms: its core lies in 
relevance decisions, i.e., in the decision as to which statements (sentences) 
should be included in the abstract and which left out. Relevance decisions 
were already underpinned by several arguments with different backgrounds. A 
lack of textual coherence in automatically generated abstracts was noted. The 
first attempts were made to revise sentences stripped of their original context 
in order to make the abstract more coherent and more readable. Ways of tailor- 
ing abstracts to users were also considered. Many of the viewpoints formulated 
during the first decades of automatic abstracting are still being followed up to- 
day. 

According to MATH72, a computer-based abstracting system of the 1970s 
had to manage the following subtasks: 

• read the document to be abstracted 

• analyze the document 

• apply a set of selection and/or transformation rules to produce the abstract 

• format the resulting abstract 

• print the abstract 
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This task list is constrained by the technical possibilities. At that time, com- 
puter systems were unable to deal with the characters occurring in normal 
printed technical papers. Pre-editing procedures were used to clean up features 
which were of no use because the scanner of 1970 was unable to read them 
and the line-printer was unable to make them visible. Drawings and other 
graphic elements were banished from abstracts and abstracting for the same 
reasons, and discarded during pre-processing. The formatting of the output and 
the resulting presentation were not particularly inspired, a simple printout in 
uppercase letters being the only feasible solution. Document analysis and the 
application of selection mles were the summarization-specific part of the pro- 
cedure, whereas in the other steps, the researchers dealt with problems of gen- 
eral file and text processing. 

What happened during document analysis and selection/transformation of 
abstract sentences differed in the individual approaches. We look at four of 
them: 

• the very first automatic abstracting system of LUHN58 

• the remarkable TRW system described by EDMU61, EDMU63 

• the ADAM system of RUSH71 and MATH72 

• the text-oriented sentence selection method proposed by SKOR71, 
SKOR81 



5.2.1 Luhn’s abstractii^ system 

In conformity with the fmdings-oriented presentation that is common in ab- 
stracting, the reader is invited to approach Luhn’s abstracting procedure by 
looking first at its result. Figure 5.2 shows the first auto-abstract proposed by 
LUHN58. It was compiled following Luhn’s method of sentence extraction. The 
weights that determined the inclusion of the sentences in the abstract are given 
in brackets behind the individual sentences. The extract of the original can as- 
sume the function of an abstract, at least to a certain degree. 

LUHN58 generated the first automatic abstracts using the following proce- 
dure: 

1. The document to be abstracted was punched onto cards and then transferred 
to magnetic tape. Luhn selected papers that needed no pre-editing. 

2. The text was read, word by word. So-called common words (i.e., non-sub- 
stantive words) were deleted through table look-up. The remaining words, 
called content words, were associated with any punctuation that preceded 
or followed them, and their exact location in the original document was no- 
ted. 

3. The content words were sorted into alphabetical order. 
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4. Words of similar spelling were “consolidated” by a rough approximation to 
a stemming algorithm. Successive pairs of word tokens were compared let- 
ter by letter. Tokens which presented less than 7 letter non-matches were 
assumed to be of the same word-type, i.e., to represent the same concept. 
The frequency of occurrence of each word-type was then determined and 
low frequence word-types were deleted. The remaining word-types were 
considered to be significant. 

5. The significant word-types were sorted into location order. 

6. Sentence representativeness was determined. Sentences were divided into 
substrings each of which was defined by significant words separated by no 
more than four non-significant words. Significant words separated from other 
significant words by more than four words were regarded as isolated and 
were not given any further consideration. For each substring, a representa- 
tiveness value was calculated, simply dividing the number of representative 
tokens in the cluster by the total number of tokens in the cluster. Sentences 
reaching a representativeness value above a preset threshold, or else a pre- 
determined number of sentences reaching the highest value, were selected 
for inclusion into the abstract. 

7. The abstract was printed. 



Scxiroe: The Scientific Amaican. Vd. 196, No.2, 68-94, February 1958 

Title: Messengers d the Nervous System 

Author: Amodeo S. Marazzi 

Editor’s sub-heading: The internal communication d the body is mediated by chemicals as well as by nerve 
in^xdses. Study d their interaction has developed inpoctant leads to flie und^:standing and therapy d mental 
illness. 

Auto- Abstract 

It seems reasonable to credit the single-oeDed organisms also with a system of chemical communication by 
diffusion of stimulating substances Surough the cell, and these coirespond to the chemical messengers (e.g., 
hormones) that carry stimuli fix>m cell to cell in the more complex organisms. (7.0) 

Finally, in the vealebrate animals thea:e are special glands (e.g., the adrenals) for producing chemical messengers, 
and the nervous and chemical communication systems are intertwined: for instance, release of adrenalin by 
the adrenal gland is subject to control both by nerve impulses and by chemicals brought to the gland by the 
blood. (6.4) 

The eiqpeiiments clearly demonstrated fliat acetylcholine (and related substances) and adrenalin (and its 
relatives) exeat opposing actions which maintain a balanced regulation of the transmissions of narve inpulses. 
(6.3) 

It is reasonable to suppose that flie tranquilizing drugs counteract 4ie inhibitory effect of excessive adrenalin 
or serotonin or some related inhitHtor in the human nervous system. (7.3) 



Fig. 5.2. The first auto-abstract (from LUHN58) 
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Because of the clear-cut and simple techniques, Luhn’s approach has retained 
some appeal, and it may especially win the heart of engineers. Luhn’s basic 
method of sentence selection was superseded in later research by more sophis- 
ticated ones, but it was never refuted. 

The limitations of Luhn’s pioneering approach are best seen by checking 
what information is used. The abstracting procedure is restricted to the relation 
between source information and textual abstracts only. In these, it considers 
strings as indicators for word types and approximates sentences to strings be- 
tween punctuation marks. A discourse is never considered, and texts are re- 
duced to sentences in sequence. Any communication function or adaptation to 
the conununication situation was far beyond the horizon at that time. 



5.2.2 The TRW study: An abstracting system 
and a research methodology 

The TRW (Thompson Ramo Wooldridge Inc.) study of automatic abstracting 
(described in EDMU61, EDMU63, WYLL68, EDMU69) aimed at a system for 
indicative abstracting and at a research methodology which would make it 
possible to handle new texts and new abstracting criteria efficiently. 

The work of Edmundson and Wyllis remains remarkable because of its far- 
reaching insights. The research methodology comprised a study of the abstract- 
ing behavior of humans, a general formulation of the abstracting problem and 
its relation to the problem of evaluation, a mathematical and logical study of 
the evaluation problem and of relevance assessment, and a set of abstracting 
experiments employing a cycle of implementation, testing, and improvement. 
The TRW group considered five evaluation methods: 

• Intuitive value judgement 

• Comparison with a prefabricated “ideal” abstract 

• Construction of college-type test questions on the document content to be 
answered from abstracts by a sample population, testing the summary 
function of the abstracts 

• Retrieval tests via the abstracts 

• Statistical correlations 

The main problem of human abstracting is the decision about relevance: what 
is put into the abstract and what not. The question was tackled by by the TRW 
researchers and others (in particular RATH61, RESN61). They asked how 
consistent abstracters are in sentence selection. Abstractors were found to dif- 
fer in their consistency in abstracting the same article at fairly widely sepa- 
rated points in time. The variation among different abstractors was even more 
important. But although the human abstractors were only modestly consistent 
in producing abstracts of a given document, intriguingly, the abstracts they 
produced were adequate. 
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A major contribution of the TRW system is the principled approach to rele- 
vance assessment by combined arguments. The study showed the potential of 
four methods of sentence selection which they called the cue, key, title, and 
location methods, respectively: 

• The cue method makes use of a list of words which are classified as bonus 
words (positive value), stigma words (negative weight), or null words 
(irrelevant to sentence selection). 

• The key method is based on the occurrence frequency of words as proposed 
by Luhn in his pioneering work. 

• The title method starts out from words in the title and in subtitles. Sen- 
tences containing words that also occur in the title are assigned a higher 
weight than sentences which share words with subtitles, or sentences 
which have no such words. 

• The location method is based on the hypothesis that important passages 
are placed in known locations of the document, right after headings that 
announce them and in general early or late in a document, section, or 
paragraph. 

In the final system the relative weights of the four methods were integrated by 
a linear function. 

Most interesting for later research are the conclusions that the TRW group 
drew from their own system development experience. WYLL68 ends up by pos- 
tulating a knowledge processing approach: 

The most serious disadvantage of current computer-produced abstracts is that 
they consist of individual sentences of the original text, extracted according to 
one or more criteria. Not only do the extraction criteria require further research, 
but the resulting set of individual sentences presents problems of disjointness, 
incompleteness, redundancy, and the like. The ultimate goal of research in 
automatic abstracting is to enable a computer program to “read” a document and 
“write” an abstract of it in conventional prose style, but the path to this goal is 
full of unconquered obstacles. 

From his evaluation of automatically generated abstracts, EDMU69 similarly 
came to the conclusion that more information must find its way into the selec- 
tion of contents for abstracts: 

... future automatic abstracting methods must take into account syntactic and 
semantic characteristics of the language and the text: they cannot rely simply 
on gross statistical evidence. 



Research striving to accomplish these insightful suggestions is still under way 
in 1998. 
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5.2.3 ADAM - the automatic document abstracting method 

The ADAM system for automatic document abstracting relies on two key com- 
ponents, a dictionary called the word control list (WCL) and a set of rules 
specifying functions for each WCL entry. The system aims to produce abstracts 
with the following characteristics: 

• Objectives of the work are included, as well as results and conclusions. 
Methods are included only if they are the main purpose of the investiga- 
tion. 

• Negative results are excluded, unless they are the sole results. 

• No examples, explanations, speculative statements, opinions, or compari- 
sons are accepted in the abstract. 

• Preliminary remarks, equations, footnotes, references, quotations, tables, 
charts, figures, graphs, descriptive cataloguing data, and the like are not 
included. 

• Unconventional or rare characters and abbreviations are excluded. 

• Except for actual results, abstracts contain no numbers. 

• The terminology in the abstract follows the terminology of the original. 

• The size of the abstract is approximately 10% of the original document. 

ADAM almost exclusively uses the cue method. Cue words or phrases are 
listed in the WCL together with a semantic weight and a syntactic value used 
for deciding whether the current sentence is a candidate for deletion or for re- 
tention. In the WCL (positive) phrases like our work or this paper are found, 
but also negative ones such as obvious or believe that support the deletion of 
the sentence in which they occur. The WCL contains less than 700 entries. 
The 14 semantic weights range from ‘T” for very positive terms which almost 
unequivocally indicate something important via “A” for very negative terms 
and “L” for introductory quantifiers such as once or a to 'D” for words that are 
to be deleted. The codes of the semantic weights are at the same time func- 
tions that process data accordingly. The same is true for the syntactic markers, 
which most of the time correspond to word classes, but also include a deletion 
mark. They serve to perform a partial syntactic analysis. 

Whereas the cue words and phrases are first and foremost used to decide 
about the relevance of sentences, some cue words such as these, it, and above 
are exploited to infer intersentence relationships. 

The ADAM authors wanted abstracts to be as coherent as possible both logi- 
cally and linguistically. To achieve this aim, MATH72 submitted the chosen 
sentences to some transformations. If sentences considered worthy of inclusion 
in the abstract required an antecedent (as seen from function words or phrases 
such diS for that reason, hence, those, etc.), the three sentences preceding the 
selected one are examined to determine whether they should also be included 
in the abstract. If the related sentences are not found, the selected sentence is 
rewritten to make it stand alone. If this is not feasible, it is deleted. Rewrites 
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often not only improve text coherence, but also the readability of the abstract, 
at the same time yielding a terser formulation. For instance, the following two 
sentences from the source document are combined to one abstract sentence: 

The experiment resulted in a modification of the original hypothesis. 

The experiment resulted in a change of our basic approach. 

The experiment resulted in a modification of the original hypothesis and in a 
change of our basic approach. 



5.2.4 Sentence extraction on the basis of the functional text weight 

The research of SKOR71, SKOR81 is based on the assumption that a method 
of automatic abstracting must take the current text into account if it is to ob- 
tain good results. The individual characteristics of a given text are defined by a 
semantic network. Two sentences are said to be semantically related 

• if at least one noun occurs in both sentences, 

• if the sentences contain two words which have been predefined as being 
semantically related (think for example of fire and heat), or 

• if the sentences contain two words which are related with respect to a 
given text. 

The significance of each sentence is assumed to be directly proportional to the 
number of sentences which are semantically related to it. Thus, nodes of the 
network which have the most incident arcs are defined as being the most sig- 
nificant. Skorochodko obtains a relevance assessment method that approxi- 
mates the connectivity of sentences. In scoring connectivity, his approach is in 
line with that of TRAB85a/85b. 

CLAU87 uses concepts from SKOR81 and specifies an algorithmic method 
of abstracting which extracts sentences on the basis of their functional value in 
the original text. She compiles abstracts (or extracts) of 50 Russian magazine 
articles on electronics. She expects her algorithmically produced abstracts to 
include details about the goal/purpose of the article, the method or the experi- 
ment carried out, results, application, and conclusions, provided that these sub- 
texts appear in the original document. The abstract should consist of not less 
than three and not more than six Russian sentences. Semantic relations are 
built up through semantic equivalence or reference identity of concepts. The 
relations can exist between primary nominations in the title and related 
secondary nominations in the text, or between sentences in the text. The func- 
tional weight of a sentence results according to SKOR81 from the number of 
other sentences with which the sentence is semantically linked. Sentences 
with a high functional weight are considered central. 
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At this point of development, the method consists of eight stages: 

1. Determining sentence limits and numbering the sentences 

2. Breaking down the title into title elements (primary nominations of con- 
cepts) 

3. Identifying secondary nominations in the text 

4. Determining intersentence semantic links 

5. Identifying the sentences in the text that are semantically linked to each 
title element 

6. Identifying the functional weight that characterizes the value of the indivi- 
dual sentences in the overall text 

7. Sorting the sentences according to their functional weights 

8. Including the sentences with a functional weight above level 3 in the ab- 
stract. 



5.3 Systems following the advent of cognitive science 



Whereas early work in computerized summarization concentrated on the prac- 
tical task of abstracting scientific and technical documents, later approaches 
inspired by cognitive science incorporate a model of the intellectual process 
during summarization. This theoretical background was not yet available when 
the first abstracting systems were developed. Before turning to concrete sys- 
tems as listed in Table 5.1, we remind ourselves of essentials of cognitive text 
processing and summarizing theories (see Chaps. 2 and 3 if needed). 

Summarization systems stemming from theories about human cognition 
globally follow the basic organization of cognitive processing described by the 
postulates of KINT83 (see Fig. 3.2). Human beings first of all understand the 
text. They interpret it and reconstruct its meaning with respect to their prior 
knowledge. Inferences correspond to the thinking acts of human comprehend- 
ers. Inferences drawing upon prior knowledge fill the inevitable coherence gaps 
of incoming information and adapt the new knowledge to what is already in 
stock. The result of input interpretation is represented in memory. The memory 
representation serves as a basis for a textual summary. 

One central instrument of understanding and summarizing is that of cognitive 
schemata. They have computational realizations as frames and scripts, which 
are well-known formats of knowledge representation. The schemata include 
prior knowledge about the meaning structure of the incoming events. This 
knowledge allows us to examine the input representation with regard to the 
most important aspects (actions, actors, etc.). A reader who understands an ar- 
ticle about a political convention uses knowledge about what goes on in po- 
litical conventions, and a listener who follows the report of a childrens’ birth- 
day party can only do so because (s)he knows the structure of the event, in 
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Other words because (s)he knows a cognitive schema that stores prior knowl- 
edge about that event type. 

Expectation-driven understanding paves the way for summarization because 
it allows us to decide about the importance of incoming knowledge items. As 
soon as the newly acquired text knowledge is entered in the schema, the im- 
portance of the individual knowledge items is assessed with respect to the 
schema. We know, for instance, that in a description of a murder, the murderer 
and the victim are important. 



Table 5.1. Summarization systems inspired by cognitive science theories 



Approach 


Tasks 

addressed 


Methods 


Special 

features 


Source data 


FRUMP 80-82 


Summarizing 


Knowledge 
processing, ex- 
pectation-driven 
partial parsing 


Use of event 
schemata 


News about 
events 


SUSY 82-85 


Summarizing 


Text processing, 
theoretical 
basis, partial 
realization 


Design of a 
comprehensive 
text processing 
model 


Scientific 

texts 


TOPIC 86-89 


Indicative 

summarizing 


Text 

representation, 
word expert 
parsing, 
ontology 
activation 


Theme-rheme 
tracking, 
subtext repre- 
sentation, 
graphical 
presentation 


Medium-sized 

computer 

articles 


SCISOR 

87-90 


Summarizing 


Knowledge 
processing, 
memory model 


Integration of 
multiple news 
articles, memory 
model 


Time-depend- 
ent news, 
corporate 
takeovers, 
terrorism 


PAULINE 88 


Summary 

generation 


Modeling of 

communicative 

intentions 


Pragmatic 

audience 

adaptation 


Story base 



Theories about human cognitive organization also serve as basic system archi- 
tecture for summarization systems. In contrast with former approaches, these 
summarization systems use semantic representations of text meaning which 
must be derived from input. The internal representation is reworked by infer- 
ences. It feeds utterances, among them summaries. 
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The human mind is flexible and often knows more than one suitable organiza- 
tion for a task. This is also true for summarization. The most important choice 
regards the central activity in summarization, the reduction of an information 
aggregate to its most essential points. It can be integrated into understanding 
and into target output generation. In the first case, we try to confine perception 
to the things we want to know, excluding everything else from our view. In the 
second, we select from the information offer what is interesting for a special 
purpose. Computerized systems may follow both approaches. 

Reduction during comprehension has advantages: instead of understanding a 
large body of information (for instance a complete data base, monograph, or 
collection of photographs) while looking only for one or a few items contained 
in it, it is preferable to restrict input right away to the item (or items) of inter- 
est, excluding everything else from interpretation. We then arrive at a cogni- 
tive activity which is currently called information seeking or information ex- 
traction. Information reduction during understanding is possible if we know in 
advance which items are important. However, this is not always the case. 

Meaning reduction may also occur during question answering, or in general, 
as soon as we know the addressee and the question. The information request al- 
locates relevance to a small part of all the knowledge that we have in stock. 
Since we do not know until the question is asked which information is consid- 
ered important, we cannot summarize beforehand. Instead, we must insist on 
last-minute relevance assessment and summarization. In such a constellation, 
summarization typically happens while the information is reworked for a par- 
ticular use, giving it a presentation that facilitates its absorption, in the sim- 
plest case as a normal printed text. 

When summarization is postponed to delivery time we may preprocess the 
material in order to be prepared and faster at question time. Summarization 
can start from preprocessed data, for example from a well-organized file or a 
comprehensive database that contains knowledge about the topics in a domain. 
Under these conditions, the task of the summarizer is restricted to selecting the 
items of interest and packing them concisely in text, in order to transmit a 
maximum of information with the least possible effort for its user. For a com- 
puterized system, this operationalization of summarizing has the advantage of 
providing relief from discourse understanding. It makes the task easier. 

After this introduction, the reader will expect newer approaches to computer- 
ized summarization to be preoccupied with knowledge processing, whereas the 
pioneer systems kept to the features of the linguistic surface. The systems 
which are discussed in the following will not deceive these expectations, they 
transform their input into an internal representation, they make use of text or- 
ganization features, they have memories, they operate according to communi- 
cative intentions, and so on. 
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5.3.1 FRUMP 

We learned above about special cognitive structures called scripts, which or- 
ganize our knowledge about common events such as visits to restaurants or 
children’s birthday parties. For many events regularly reported in news articles 
we can indicate scripts as well: demonstrations, earthquakes, visits of states- 
men, political conventions, baseball matches, etc. For all these events, citi- 
zens and newspaper readers have appropriate scripts which allow them to in- 
terpret a new event in the light of prior experience. From these scripts of com- 
mon events people draw a preestablished structure of interesting information 
items and their relations. For instance, they expect political conventions to 
have delegates debating and voting. 

FRUMP (DEJ082) adopts the human strategy of interpreting new informa- 
tion with expectations won from our own cognitive schemata. FRUMP’s 
pragmatic and semantic knowledge is coded in a knowledge base. It is used to 
predict general events that are likely to be reported. The text analyzer then 
tries to find instances of these expected events in the input text. Thus 
FRUMP’s interpretation is expectation-driven. On the basis of whether or not 
predicted events are found, FRUMP reassesses its interpretation of the situa- 
tion and makes new predictions. 

FRUMP’s knowledge of situations in the world is organized in sketchy 
scripts. A sketchy script contains only the important events that may occur in a 
situation, whereas other types of script may be more comprehensive. FRUMP 
has scripts for earthquakes, demonstrations, explosions, and other events. For 
instance, the demonstration script expects the following events: 

Event 1: The demonstrators arrive at the demonstration location. 

Event 2: The demonstrators march. 

Event 3: The police arrive on the scene. 

Event 4: The demonstrators communicate with the target of the demon- 
stration. 

Event 5: The demonstrators attack the target of the demonstration. 

Event 6: The demonstrators attack the police. 

Event 7: The police attack the demonstrators. 

Event 8: The police arrest the demonstrators. 

For an earthquake, we might set up a structure that accounts for the time and 
the place of the quake, its strength, the injuries, the damages, and so on. The 
script also decides about what it is important to know in an event. Since the 
color of the cars and the number of spectators are not mentioned in an earth- 
quake script, they are denied any importance, even if there should be informa- 
tion about them in an article. Among the items contained in the script, some 
may be central (e.g., the strength of the quake and the number of victims). 
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while others are more mtirginal, such as the reporting body in the earthquake 
example below (Fig. 5.3). 

For each expected event there is a description containing 

• semantic constraints on script variables (e.g., the demonstrators must be 
human) 

• constraints between script variables (e.g., demonstrators and police must 
be in the same location) 

• causation relations to other requests and references to other sketchy scripts 
(e.g., any deaths in a vehicle accident are due to the crash event). 

Given that FRUMP is equipped with a set of scripts, the next problem is how 
to activate the right ones. There are three different mechanisms for referencing 
a script: 

• The simplest case happens when a script is explicitly mentioned in the 
text, like the earthquake script in the message below (Fig. 5.3). 

• An implicit reference to a script can be established if an element occurs 
which is normally related to or included in an event. The police arresting 
demonstrators would enable us to infer that there must have been a dem- 
onstration event. 

• Event-induced activation takes place if the key request of a sketchy script 
is detected in a text. Thus the arrest script would pop up as soon as 
FRUMP finds that some robber is apprehended by the police. 

In order to activate and to instantiate the scripts, the text data must be ana- 
lyzed, but no full parsing component is necessary. Instead we find a bottom-up 
arguing substantiator. It tries to verify from input the expectations derived from 
the current sketchy script. 

FRUMP has been used to summarize short press articles like the sample text 
shown in Fig. 5.3. 

Even after more than a decade, FRUMP is still worth studying for at least 
three reasons: 

• It has proposed a knowledge-processing approach to summarizing, adopt- 
ing the global organization of human cognitive processing. 

• It has been followed by designers of summarization systems who have 
reached the frontier of commercial application, such as the SCISOR sys- 
tem described below. 

• It has prepared methods for conceptual information retrieval, where infor- 
mation is no longer retrieved with the help of surface cues, but by search- 
ing for the concepts themselves. 
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Mount Vernon, 111. (UPI) - A small earthquake shook several Southern Illinois counties 
Monday night, the National Earthquake Information Service in Golden Colo, reported. 

Spokesman Don Finley said the quake measured 3.2 on the Richter scale, “probably 
not enough to do any damage or cause any injuries.” The quake occurred about 7:48 
p.m. CST and was centered about 30 miles east of Mount Vernon, Finley said. It was 
felt in Richland, Clay, Jasper, Effington and Marion Counties. 

Small earthquakes are conunon in the area, Filey said. 

Sketchy script: $earthquake 

Summary: There was an earthquake in Illinois with a 3.2 Richter scale reading. 



Fig. 53. Sample news article processed by FRUMP (from DEJ082) 



5.3.2 SUSY - a summarizing system for scientiHc texts 

The design of the SUSY system (FUM82, FUM84, FUM85a/b) is of particular 
interest because the authors use research findings of KINT74/78 as the concep- 
tual background for their system. Their summarizing strategies are intended 
with a certain degree of abstraction to follow the human approach, above all 
the human ability to consider the interests and goals of the addressee with re- 
spect to the length and contents of summaries. Unfortunately, only parts of the 
system have been implemented, but the overall design can be described on the 
basis of a system outline by FUM82. 

SUSY is intended to understand and summarize the meaning of a special- 
ized text, for example a scientific essay. In the dialogue with the user, SUSY 
receives a text schema and a summary schema together with the input text. In 
these, the user describes how the input text is structured and what the structure 
of the summary should look like. The schemata are retrieved from a library of 
basic text and summary structures and adapted accordingly; where necessary a 
totally new schema is also defined. These schemata help to guide the analysis 
with expectations. 

The SUSY parser constructs a propositional internal text representation in 
accordance with KINT74/78. It includes specialists in the sense of word expert 
parsing (SMAL82) except that they have to deal with extensive problem areas 
such as syntax or semantics. The syntax specialist, for example, has the task of 
constructing a case grammar-oriented sentence representation. 

The parser proceeds in three stages (see Fig. 5.4): 

1. In the sentence understanding phase, SUSY constructs from the natural lan- 
guage input text a propositional representation, for which a specification is 
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available (FUM84). The result of the sentence understanding is the basic 
linear representation. It represents the meaning of the input sentences. 

2. In the text structure analysis, the basic linear representation is expanded 
into the extended linear representation, which additionally records the text 
meaning structure, i.e.: 

• the logical structure of the text, in other words the conceptual rela- 
tions between sentences 

• the rhetorical text structure, which explains the sequence of ideas and 
arguments in the text. 

3. The importance ranking component orders the elements of the extended lin- 
ear representation hierarchically according to importance. Each element is 
given a weight. The result of this phase is a hierarchical propositional net- 
work, which represents the result of the most profound understanding of the 
input text. 

Summarizing is described in FUM84 as the discarding of those parts of the hi- 
erarchical propositional network that contain less important information. The 
discarding strategy takes account of the user’s goals and requirements. The 
prototype of the importance ranking component comprises 40 rules and 30 
frames for encyclopedic knowledge. 




Fig. 5.4. The SUSY system (adapted from FUM84) 




5.3 Systems following the advent of cognitive science 317 



The last subactivity of the system is the generation of a natural language 
summary. The generator takes the propositions that have not been pruned dur- 
ing importance ranking and retrieves the basic linguistic elements from the 
original text. From these it compiles the summary text. In doing so, it uses sen- 
tence models which introduce the basic rules of abstracting style. 

Of particular interest are the relevance rules that FUM85a/85b demonstrate, 
because they explain summarizing operations in a content-related or causal 
manner. They work on the micro-level of summarizing operations. FUM85a/85b 
show with the help of short text excerpts (72 words maximum) how their rele- 
vance rules apply. The rules themselves are taken from the literature. The 
authors distinguish structural rules, which refer only to the visible text struc- 
ture, semantic rules and encyclopedic rules, which compare text knowledge 
with factual knowledge. The rules can reflect the readers’ intentions. Well- 
known arguments about relevance are reflected in them. Among other things, 
the information ranking rules assess as important 

• concepts that are connected to many other concepts, i.e., that form key 
points for the cohesion of the text 

• definitions which are given the same weight as the concept they refer to 

• general concepts. 



5.3.3 TOPIC/TWRM-TOPOGRAPHIC: Indicative summaries 
from text graphs 

The TOPIC/TWRM-TOPOGRAPHIC system (HAHN90, KUHL89) processes 
relatively short specialized articles in the field of information and communica- 
tions technology. They are chosen from the full text data base of the German 
Association of Engineers (VDI). TOPIC adheres to the global process organiza- 
tion of cognitive text processing systems. It realizes an abstracting operation 
by transferring the input document into its knowledge base and presenting con- 
tents of the knowledge base to the user (see Fig. 5.5). In output, graphic, and 
textual forms of representation are used next to each other, i.e., the knowledge 
structures are either presented graphically or they are textualized prior to being 
transmitted to the user. In the retrieval dialogue the user can work through the 
different stages of detail of the representation that TOPIC/TWRM-TOPO- 
GRAPHIC offers. Complete texts, text segments, text graphs, an automatically 
generated abstract and the topic-specific conceptual knowledge appear at the 
same time in screen windows and can be consulted in the search for informa- 
tion. 
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Fig. 5.5. Information types presented by the TOPIC system (adapted from KUHL89) 



Since the aim of TOPIC is indicative abstracting, the input text is parsed only 
as far as needed for the identification of nouns and nominal groups with the 
accompanying attributes. Word experts recognize which concepts and more 
general linguistic phenomena (e.g., quantifying) occur in the text. The occur- 
rence of respective lexicon elements in the text trigger the activity of appro- 
priate word experts (SMAL82). TOPIC recognizes the relevant keywords with 
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the help of a thesaurus-like ontology, displaying them together with the rela- 
tions between them in a conceptual network. Concepts occurring in the docu- 
ment activate the corresponding concepts in the ontology. The activated part of 
the knowledge network provides the meaning of the processed text or text 
segment in so far as it is represented in nominal concepts. Parsing takes text 
organization into account, detecting patterns of local text cohesion (anaphora, 
cataphora, and lexical cohesion) that correspond to the basic form of the 
theme-rheme organization of texts (DANE74). Concepts are grouped in basic 
text constituents, representations of texts or text parts (such as paragraphs or 
sections). They contain the thematically relevant concepts of a text segment, 
whose semantic relations are expressed through super/subordinated term, and 
prototype/instance relations. The nodes correspond to noun concepts, possibly 
with attributes. They are realized as frames, in the case of greater specificity 
also as slots or slot entries in frames. 

Text condensing takes place by forming the already mentioned text constitu- 
ents and by reducing their meaning to that of the dominant concept. There are 
various criteria for determining dominant concepts. Among other things, a con- 
cept is considered to be dominant if it has a sufficient number of activated 
subsumptions. Every paragraph boundary triggers a condensing process, which 
thus defines the theme of the last paragraph. 

When a textual abstract is generated, the relevant nominal concepts from 
the text graphs are selected and incorporated in text. The central text themes 
are taken from the user’s probe question. The user can request an indicative or 
an indicative-informative abstract. After that a text plan is created. It deter- 
mines which concepts should be expressed in a sentence, how the relations 
between the concepts are established, in what order the sentences can be ar- 
ranged and how inter-sentence links can be made clear. Subsequently, the text 
plan is translated into natural language with the help of templates. A discourse 
strategy determines the sequence of the text constituents. 



5.3.4 SCISOR (System for Conceptual Information Summarization, 
Organization, and Retrieval) 

SCISOR summarizes newspaper stories using a conceptual representation of 
knowledge about possible events. JACO90 present it as an alternative to a tra- 
ditional retrieval system: SCISOR realizes a conceptual retrieval because it 
outputs conceptual structures from its knowledge base. The system is interest- 
ing as an approach to summarizing because it is supposed to produce summa- 
ries from the large-scale knowledge base in its memory. Especially in the case 
of full-text databases, RAU89 argue, it is often of no help to users to receive 
references to complete documents from which they then have to extract the 
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knowledge which is of interest. In the application field of SCISOR, this means 
that from possibly many separate reports about corporate mergers they must re- 
construct the core of events, which can scatter over a long period of time. They 
are better informed with an integrative summary which relieves them of this 
preprocessing task. 

SCISOR is also included in the software product GE NLToolset which aims 
at extracting information from text using a knowledge-based, domain-independ- 
ent core of text processing tools, and customizing the existing programs to 
each new task. After explaining SCISOR, we look at the customized system in 
an information extraction environment as proposed at the fourth Message Un- 
derstanding Conference (h4UC-4). 

SCISOR in the corporate merger domain. Figure 5.6 shows how news arti- 
cles and questions are processed in the system. The news items are adopted as 
they are received from the press agency. The first problem is to select the arti- 
cles that deal with corporate take-overs. The relevant articles are then ana- 
lyzed according to a mixed data- and expectation-driven method. The prefilter 
breaks incoming articles down into their headline, a subtitle, the actual text, 
and a dateline. After that, the (composite) filter determines whether the arti- 
cles belong to the subject field. Articles with series titles that are known to be 
irrelevant are discarded straight away. SCISOR then compares the story with a 
keyword list that contains around 150 words such as buy, merger, and acquisi- 
tion. Each keyword has a weight. When words of this kind are found, a rele- 
vance index for the article is calculated. This helps to decide whether the arti- 
cle should be accepted into the SCISOR system. A pattern analysis is carried 
out on the remaining news with doubtful relevance. It searches for patterns of 
multiword groups (example: buy &&& debt obligations). The patterns may con- 
tain gaps and make positive or negative statements about whether the article 
belongs to the subject area. A metric is then calculated from them. If the 
analysis described above is still unable to determine whether an article is rele- 
vant, a conceptual and linguistic analysis is carried out. 

SCISOR stores the conceptual representation of the story as an episodic net- 
work (see Fig. 5.8). The concepts occurring are individual concepts, but they 
are also members of classes such as companies, offers, or mergers. The asso- 
ciation with classes makes the individual stories retrievable. Questions are 
analyzed in the same way as articles. On retrieval, the representations of the 
question and the story are matched. In this way, the answer and the most rele- 
vant articles are found. 

Figure 5.7 shows how a partial syntactic analysis and a top-down semantic 
analysis cooperate to derive a conceptual structure. The example sentence de- 
scribes an offer for a corporate takeover. It presents at least one ambiguity at 
the syntactical level: we cannot know whether the investment group is the 
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company to be sold, or whether the investment group intends to purchase Re- 
vere. 

We see that the syntactic analysis produces tree structures which character- 
ize clauses {Revere, received an offer, an offer from an investment group, etc.). 
The parser remains in the safe area of nominal and verbal groups and avoids 
being caught in the difficult interpretation of sentence junctions (said or about) 
which often are ambiguous. 



Takeover 

articles 



User 

question^ ^ 



Raw Article Takeover 




Fig. 5.6. Information flow in SCISOR (adapted from JACO90) 



During semantic analysis, the syntactic trees are given a semantic interpre- 
tation in terms of verbal case frames. Besides the concept offer, the verbal 
concept acquiring plays the central role in the sentence. It is associated with 
an actor who acquires, an object which is acquired, possibly the owner of the 
object (in a prepositional phrase), and a remuneration which persuades him or 
her to cede ownership (introduced by a preposition like “for”). Using the case 
frame of the verb, the semantic analysis switches from a description in syntac- 
tical terms (subject, object, prepositional phrase, etc.) to the representation of 
an offer and a buying event. 

The concept corporate takeover is not mentioned in the input sentence. 
Nevertheless, the event that is reported is that a company offers to purchase 
another company, which is a move in a corporate takeover. Anyone who ig- 
nores the definition of the corporate takeover is unable to infer from the data 
what is really going on in the reported event. An understander solves the prob- 
lem by referring to his or her own knowledge. 
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Fig. 5.7. SCISOR analysis (from JACO90) 



SCISOR follows this approach. In its memory (see Fig. 5.8), the application 
domain corporate mergers is represented. There, we find a frame formulating 
the expectation that there must be takeovers and offers for a takeover, each 
having its conceptual organization in the form of a frame whose slots must be 
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matched by information elements found in the input. If suitable items from the 
input can be integrated in an instance of the frame, the input sentence must 
have reported an event of the expected sort. The frame describing the instance 
of the event (a concrete episode) is added to the system’s knowledge base. 

The memory organization of SCISOR is inspired by current ideas about hu- 
man memory. It comprises three abstraction levels: event knowledge, abstract 
knowledge and semantic knowledge (see Fig. 5.8). Semantic knowledge en- 
compasses the meaning of concepts, abstract knowledge includes generaliza- 
tions about events. In the event memory, knowledge about the episodes them- 
selves is stored. In addition, groups of related concepts from the episodes and 
from the abstract knowledge are indexed. 
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Fig. 5.8. SCISOR memory organization (from RAU87) 
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From its memory, SCISOR generates summaries to answer users’ questions. 
The user question determines the topic. The concepts that the question acti- 
vates in the knowledge base are candidates for inclusion in the summary. A 
question referring to events surrounding a specific takeover can draw upon the 
event memory, picking there for instance the merging companies and the date 
of a merger offer (see Fig. 5.8). RAU89 argue that it is possible to approximate 
the decision about concept centrality by measuring the distance from the core 
concept, the one the user asked about. This summarizing principle is em- 
pirically verifiable and is also known from other contexts. A further method of 
summarizing is that SCISOR bundles concepts from one group together under a 
generic concept. This helps to abridge “historical” details and information 
about the situation in which a single action of a corporate take over took place. 
Thus the SCISOR system uses two main question-time summarizing strategies, 
the exclusion of marginal ideas and generalization. 

To the author’s knowledge, SCISOR can answer simple questions in English. 
A future aim is to produce summaries of more complicated events. 

GE NLToolset for information extraction from news about Latin Ameri- 
can terrorism. Like other systems, GE NLToolset has been adapted to the 
MUC-4 application about terrorism in Latin America. Figure 5.9 shows a typi- 
cal example of an article which is submitted for information extraction to all 
systems participating in the test. The answer key template in Fig. 5.10 defines 
the target output for the systems. It deals with the bombing of R.G. Alvarado’s 
car. A second template (left out) assembles the information about the attack 
on Merino’s home. 

As operationalized by the Message Understanding Conferences (see MUC-4), 
information extraction is the task of selecting from texts those information 
items which are defined as worth knowing by a domain-specific template or 
semantic frame and filling them into the appropriate slots of the template. In- 
formation extraction requires some understanding of the texts, but it presents a 
more limited challenge than would a task requiring production of an in-depth 
representation of the contents of complete texts. The output of information ex- 
traction resembles a partially formatted database. Current database searching 
techniques can be applied to it. 

The application domain typically comprises short event-oriented texts such 
as news articles or navy operation reports which occur in large quantities and 
describe the events in a domain. Domains differ in their knowledge back- 
ground, but not so much in their structure. 

First and foremost, the information extraction task is interesting because of 
the practical need to improve text retrieval methods for large full text bases. A 
second attraction of the task is that it can serve as a test bed for current sys- 
tems. In the Message Understanding Conferences, a performance evaluation 
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methodology is applied. Metrics are adapted from the classical information re- 
trieval measures such as precision and recall. 

The task of information extraction can now be operationalized as filling the 
slots of the target templates correctly from text information, deriving precisely 
the prescribed values. Any deviation from the prescribed output is scored 
negatively. 



SAN SALAVADOR, 19 APR 89 (ACAN-EFE) -[TEXT ]SALVADORIAN PRESIDENT-ELECT 
ALFREDO CRISTIANI CONDEMNED THE TERRORIST KILLING OF ATTORNEY GENERAL 
ROBERTO GARCIA ALVARADO AND ACCUSED THE FARABUNDO MARTI NATIONAL 
LIBERATION FRONT (FMLN) OF THE CRIME. 

LEGISLATIVE ASSEMBLY PRESIDENT RICARDO VALDIVIESO AND VICE PRESIDENT- 
ELECT FRANCISCO MERINO ALSO DECLARED THAT THE DEATH OF THE ATTORNEY 
GENERAL WAS CAUSED BY WHAT VALDIVIESO TERMED THE GUERRILLAS’ 
“IRRATIONAL VIOLENCE”. 

GARCIA ALVARADO, 56, WAS KILLED WHEN A BOMB PLACED BY URBAN GUERRILLAS 
ON HIS VEHICLE EXPLODED AS TT CAME TO A HALT AT AN INTERSECTION IN 
DOWNTOWN SAN SALVADOR. 

“WE HAVE TO CONDEMN THIS INCIDENT, TT IS A GUERRmLA ACT”, ALFREDO 
CRISTIANI, NATIONALIST REPUBLICAN ALLIANCE (ARENA) PRESIDENT-ELECT, WHO 
WJLL REPLACE CHRISTIAN DEMOCRATE JOSE NAPOLEON DUARTE ON 1 JUNE, 
STATED. 

ACCORDING TO CRISTIANI, THE ATTACK TOOK PLACE BECAUSE ATTORNEY GENERAL 
GARCIA ALVARADO WARNED THAT “HE WOULD TAKE MEASURES AGAINST URBAN 
TERRORISTS.” 

VICE PRESIDENT-ELECT FRANCISCO MERINO SAID THAT WHEN THE ATTORNEY 
GENERAL’S CAR STOPPED AT A LIGHT ON A STREET IN DOWNTOWN SAN SALVADOR, 
AN INDIVIDUAL PLACED A BOMB ON THE ROOF OF THE ARMORED VEHICLE. 

‘THE DRIVER TOLD THE ATTORNEY GENERAL ABOUT THE BOMB. THE VEHICLE 
SWERVED AND THE BOMB EXPLODED, CAUSING THE TOP OF THE VEHICLE TO 
COLLAPSE ON THE ATTORNEY GENERAL’S HEAD,” MERINO STATED. 

GUERRILLAS ATTACKED MERINO’S HOME IN SAN SALVADOR 5 DAYS AGO WITH 
EXPLOSIVES. THERE WERE SEVEN CHILDREN, INCLUDING FOUR OF THE VICE 
PRESIDENT’S CHILDREN, IN THE HOME AT THE TIME. A 15-YEAR-OLD NIECE OF 
MERINO’S WAS INJURED. 

RICARDO VALDIVIESO, PRESIDENT OFTHE LEGISLATIVE ASSEMBLY AND AN ARENA 
LEADER, SAID THE FMLN AND ITS “FRONT” GROUPS ARE RESPONSIBLE FOR THE 
"IRRATIONAL VIOLENCE THAT KILLED ATTORNEY GENERAL GARCIA." 
ACCORDING TO THE POLICE AND GARCIA ALVARADO’S DRIVER, WHO ESCAPED 
UNSCATHED, THE ATTORNEY GENERAL WAS TRAVELING WITH TWO BODYGUARDS. 
ONE OF THEM WAS INJURED. 

THE ATTORNEY GENERAL’S BODY WAS DESTROYED BY THE BOMB THAT EXPLODED 
OVER HIS HEAD. 

NO GROUP HAS CLAIMED CREDIT FOR THE ATTACK YET, BUT POLICE SOURCES 
CLAIM TT “IS CHARACTERISTIC OF THE FMLN URBAN COMMANDOS.” 



Fig. 5.9. The MUC-4 walkthrough text TST2-0048 






326 5 Computational Approaches 



0. MESSAGE: ID 


TST2-MUC4-0048 


1. MESSAGE: TEMPLATE 


1 


2. INCIDENTDATE 


19 APR 89 


3. INCIDENT: LOCATION 


EL SALVADOR: SAN SALVADOR (CITY) 


4. INCIDENT: TYPE 


BOMBING 


5. INCIDENT: STAGE OF EXECUTION 


ACCOMPLISHED 


6. INCIDENT: INSTRUMENT ID 


“BOMB” 


7. INCIDENT: INSTRUMENT TYPE 


BOMB: “BOMB” 


8. PERP: INCIDENT CATEGORY 


TERRORIST ACT 


9. PERP: INDIVIDUAL ID 


■URBAN GUERRILLAS” / “URBAN 
TERRORISTS” / ‘TMLN URBAN 
COMMANDOS” / “URBAN COMMANDOS” 


10. PERP: ORGANIZATION ID 


“FMLN” / “FARABUNDO MARTI NATIONAL 
LIBERATION FRONT” 


11. PERP ORGANIZATION CONFIDENCE 


SUSPECTED OR ACCUSED BY 
AUTHORITIES: ‘TMLN” / ‘EARABUNDO 
MARTI NATIONAL LIBERATION FRONT” 


12. PHYS TGT: ID 


“VEHICLE” / “CAR” / “ATTORNEY 
GENERAL’S CAR” / “ARMORED VEHICLE” 


13.PHYSTGT:TYPE 


TRANSPORT VEHICLE: “VEHICLE” / ““CAR” 
/ “ ATTORNEY GENERAL’S CAR” / 
““ARMORED VEHICLE” 


14. PHYS TGT: NUMBER 


1 : “VEHICLE” / “CAR” / “ATTORNEY 
GENERAL’S CAR” / “ARMORED VEHICLE” 


15. PHYS TGT: FOREIGN NATION 


- 


16. PHYS TGT: EFFECT OF INCIDENT 


SOME DAMAGE: ““VEHICLE” / ‘“CAR” / 
“ATTORNEY GENERAL’S CAR” / 
“ARMORED VEHICLE” 


17. PHYS TGT: TOTAL NUMBER 


- 


18. HUM TGT: NAME 


“ROBERTO GARCIA ALVORADO” 


19. HUM TGT: DESCRIPTION 


“ATTORNEY GENERAL” : “ROBERTO 
GARCIA ALVORADO” 

“DRIVER” 

“BODYGUARDS” 


20. HUM TGT: TYPE 


? “BODYGUARDS” 

LEGAL OR JUDICAL / GOVERNMENT 
OFnCIAL: “ROBERTO GARCIA 
ALVORADO” 


21. HUM TGT: NUMBER 


1 : “ROBERTO GARCIA ALVORADO” 
1: CIVILIAN: “DRIVER” 

2: “BODYGUARDS” 


22. HUM TGT: FOREIGN NATION 


1: “BODYGUARDS” 

SECURITY GUARD: “BODYGUARDS” 

? SECURITY GUARD: “BODYGUARDS” 


23. HUM TGT: EFFECT OF INCIDENT 


DEATH: “ROBERTO GARCIA ALVORADO” 
NO INJURY: “DRIVER” 

INJURY: “BODYGUARDS” 


24. HUM TGT: TOTAL NUMBER 


? NO INJURY: “BODYGUARDS” 



Fig. 5.10. The MUC^ walkthrough TST2-0048 answer key template 
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The customized version of the NLToolset has a preprocessor that uses lexico- 
semantic patterns to perform some initial segmentation of the text, identifying 
phrases that activate templates, filtering out irrelevant text, combining and 
collapsing some linguistic constructs, and marking portions of text that could 
describe discrete events. Linguistic processing combines parsing and word- 
specific semantic interpretation with domain-driven conceptual processing as 
described above for SCISOR. The knowledge base of the system consists of a 
feature and function grammar with associated linguistic relations and a core 
sense-based lexicon. The core lexicon contains over 10 000 entries, 37 of 
which are restricted to a specialized usage in the terrorism domain, such as 
device, which always means a bomb. The core grammar contains about 170 
rules with 50 relations and 80 additional subcategories. For use in the MUC^ 
domain, 23 changes in the grammatical knowledge base were necessary. Eight 
grammar rules were added, most of them dealing with the noun phrases that in 
the corpus describe organizations. 

GE NLToolset performed well in this application. The program successfully 
interpreted most of the key sentences, but it missed some references and failed 
to tie some additional information to the main event. As a result, the system 
filled two templates for what should have been one. It derived 53 slots out of a 
possible 52, with 34 correct, 19 with missing content, and 19 with spurious 
content. NLToolset was scored 0.65 for recall, 0.64 for precision, and 0.35 for 
overgeneration. These scores reached by SCISOR imply that systems in 
principle manage the information extraction task, although they differ in the 
quality of their results. 



5.3.5 PAULINE: Pragmatic aspects of text production 

There are, as we all know, many ways of saying the same thing. We consider 
it normal to adapt oiuselves to the situation and our audience when we talk or 
write. When addressing a communication partner in our daily lives we pursue 
pragmatic goals in addition to the mere imparting of information: we convey 
emotions, give incentives to act, take sides, politely avoid difficult topics, es- 
tablish familiarity or personal distance. In everyday communication we also 
adapt our summaries to the interests and preconceived ideas of our partners. 
When we summarize scientific or technical information, we have to respect 
the prior knowledge and belief structure of our audience as well, and possibly 
also their emotional response. It is a condition for successful communication 
that we tailor our summaries to the needs of their users. Computer generated 
summaries are no exception to this rule. The summarization systems that we 
have looked at so far have however not particularly emphasized user adapta- 
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tion of their products. In particular, they made no attempt to tailor their sum- 
maries to individual communication partners as human persons do regularly. 

PAULINE (HOVY88) demonstrates how summarization can be made flexi- 
ble so that it respects the demands of the individual summary user. PAULINE 
has been realized as a model for rhetorical-pragmatic text organization. It is 
capable of producing more than 100 different texts from one semantic represen- 
tation. So that PAULINE can organize its texts pragmatically, it receives an 
initial collection of behavior-guiding information. Thus, among other things, it 
has a characterization of possible conversational environments. In addition, it 
pursues pragmatic goals. They describe what effect PAULINE is supposed to 
have on its audience, e.g., a change in future behavior or simply better infor- 
mation. Moreover, the user gives PAULINE a few conversation topics. It then 
does the following (see Fig. 5.11): 

• The system collects elements from its representation of a story that fit a 
given topic. 

• It orders them, interprets them and organizes them in sentences. 

• It constructs a surface sentence for each topic, selects suitable phrases and 
words, and determines the sequence of the elements. 

• It moves on to the next sentence. 



Input topics 




Topic coUection 
CONVINCE 
RELATE 
DESCRIBE 




• Interpretation 

• New topics 

• Juxtaposition 

• Ordering 



• Sentence type 

• Organization 

• Clauses 

• Words 




g 

0 
a 

1 




1 

e 

s 



◄ — ► 



Input: 
pragmatic 
aspects of 
conversation 



Fig. 5.11. PAULINE’S system architecture (Source: HOVY88) 



This list of system activities may seem normal. From the episode network that 
represents the internal knowledge of an event, PAULINE selects and presents 
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items. What makes PAULINE so interesting for the purpose of summarization 
are the relevance criteria that are respected during the production of an utter- 
ance. PAULINE knows different scenarios and tailors the utterance to a spe- 
cific listener and situation - it or she (Pauline is said to be the author’s sister) 
pursues different communication aims. 

In the examples in Fig. 5.12, the goals formality, partiality, detail, and haste 
have different values and explain how PAULINE adapts the same information 
to different listeners and contexts. 

In case 1, PAULINE must inform an acquaintance of the outcome of a US 
primary and of the current status of the competing delegates. Neither interlocu- 
tor has opinions about the topic. Both know the US electoral process. Conse- 
quently, PAULINE’S characterizations of the speaker and the listener have 
their default values: normal interest in the topic, no sympathies or antipathies, 
calm emotional state, informal setting, normal conditions. In addition, the 
system is given the following interpersonal goals: 

• Hearer: 

affect his or her knowledge: inform 
affect her or his opinion of topic: no effect 
involve her or him in the conversation: no effect 
affect his or her emotional state: no effect 
affect his or her goals: no effect 

• Speaker-hearer relationship: 

affect hearer’s emotion toward speaker: make like 

affect relative status: make equal 

affect interpersonal distance: make distant 

These values activate the following rhetorical goals: 

• formality = colloquial 

• partiality = impartial 

• detail = details 

• haste = somewhat planned 

In case 2, PAULINE is sympathetic to Kennedy, while the listener, the pro- 
gram’s knowledgeable sibling, supports Carter. In this case, the hearer is de- 
fined as having the knowledge state expert, with depth of acquaintance inti- 
mate, relative social status equal, and emotion like. The program has the goals 
to make close the interpersonal distance and to inform the sibling. These values 
activate the rhetorical goals as follows: 

• formality = colloquial 

• partiality = implicit 

• detail = details 

• haste = somewhat planned 
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The result is much shorter than in case 1. 

In case 3, PAULINE is a Carter supporter and is speaking formally - say, 
making a speech at a parliamentary debate - so that the conversation time is 
only some and the tone is formal. The audience is presupposed to support Ken- 
nedy. Thus PAULINE has the goal to switch the hearers’ opinions of the topic, 
to make them respect the speaker, and yet feel distant. These values activate 
the rhetorical goals 

• formality = highfalutin 

• partiality = explicit 

• detail = details 

• haste = somewhat planned 

Again, the resulting speech adapts to the situation and presents facts in a dif- 
ferent fashion. 

PAULINE’S system architecture points out that the pragmatic intentions in 
utterances are not only a phenomenon of the formulation or the text surface. It 
is easy to see that already the choice of information to be expressed can be 
guided by pragmatic intentions, for example, by the wish to avoid conflicts. 



Case 1 . 

On 20 February, Carter and Kennedy were the candidates in a primary in Michigan. 
Carter lost to Kennedy by 1335 votes. Kennedy has a better chance of getting the 
nomination than before at present. Carter is also closer to getting the nomination than 
before. Both Carter and Kennedy want to get the nomination. 

Case 2. 

Well, so Carter lost the primary to Kennedy by 1335 votes. 

Case 3. 

I am pleased to inform you that Carter has improved his chances of winning the 
nomination. At the present time. Carter has many more delegates than he had in the 
past; also. Carter has many more than Kennedy does. 



Fig. 5.12. PAULINE adapting to listeners and situations 



5.4 New technology, increased demand, a new wave of systems 



Whereas the pioneers of automated abstracting had limited options in terms of 
storage, processing speed, and character sets, later designers of summarization 
systems were granted the freedom to constructively think of interactivity, visu- 
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alization, and multimedia solutions. The general advance in computerization 
has widened the range of domains where computers are used - just think of ac- 
countancy, home banking, or electronic publishing. Today, computerized word 
processing systems have come into their own as a tool and medium for text 
production. We find large full-text databases which are often created inside 
corporations for their own purposes. Other databases are offered commercially, 
for instance by newspapers or specialized information services. The problem of 
acquiring machine-readable full texts, which in the era of print media seemed 
to make the routine application of summarizing systems illusory, has vanished. 

Moreover, nobody can read all these full text databases at a computer 
screen, or print them for inspection. Therefore, complying with short-term de- 
mands is an issue in automatic summarizing in the 1990s. It explains the re- 
newed interest in sentence extraction, the only domain-independent summari- 
zation technique that works in application environments. Improved and re-en- 
gineered extraction techniques can serve some of the most pressing purposes. 
More sophisticated systems take more development time, but they also profit 
from the reinforced demand to have them for applications. This new research 
situation has brought zest in the thorny area of serious summary evaluation 
(see HAND97, JONE93), and it speeds up the development of the whole field 
in a sort of post-modern coexistence of all helpful contributions, be their flag- 
ship the robustness of their methods, their portability, their cognitive adequacy, 
or their multimedia opening. 

As a consequence, the reader will find very different systems and approaches 
in the following section (see Table 5.2). First, recent statistical work is re- 
viewed. The following group is characterized by the integration of corpus lin- 
guistics, shallow parsing methods, and lexical semantics knowledge, most 
typically provided by WordNet. Next, two approaches deal with the operation- 
alization of discourse structures and their use for coherent summary production. 
In the next group, constituted by the SUMMARIST and the SIMPR systems, 
the skilled integration of methods from different backgrounds is the most in- 
teresting feature. The developers of STREAK and SUMMONS, the next sys- 
tems in the table, focus on summary text generation starting from structured 
knowledge. The list concludes with an effort to integrate information from dif- 
ferent media, in this case provided by a battlefield simulator, into a textual 
summary. 
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Table 5.2. Overview of recent summarization systems and approaches 



Approach 


Tasks 

addressed 


Methods 


Special 

features 


Source data 


Singhal et al. 97 


Paragraph 

extraction 


Statistical, word 
count, similarity 
metric 


Text 

segmentation 


Encyclopedia 

articles 


Kupiec et al. 95 


Sentence 

extraction 


Statistical 


Text segmenta- 
tion, training, 
combined 
heuristics 


Engineering 
articles with 
abstracts 


Teufel / Moens 
97 


Sentence 

extraction 


Statistical 


Text segmenta- 
tion, training, 
combined 
heuristics 


Computation 
linguistics 
papers with 
abstracts 


Boguraev / 
Kennedy 97 


“Capsule over- 
views”, “topic 
stamps” 


Corpus linguis- 
tics, lexical 
morphology, 
statistics 


Local salience 
metrics, referent 
tracking 


Short computer 
articles 


Barzilay / Elhdad 
97 


Sentence 

extraction 


Lexical chains 


WordNet seman- 
tic relations for 
topic identifica- 
tion 


Short computer 
articles 


Ono et al. 94 


Abstracting 


Linguistic sur- 
face processing, 
discourse 
relations 


Reconstruction 
of the RST 
relations 


Japanese 
editorials, 
technical papers 


Marcu 97 


Importance 

scoring 


Corpus linguis- 
tics, statistical, 
discourse rela- 
tions, empirical 


Reconstruction 
of the RST rela- 
tions 


Five texts from 

Scientific 

American 


SUMMARIST 97 


Concept 
extraction / 
summarizing 


Combined 
statistical and 
lexical data 


Open integrative 
structure, 
WordNet use 


Press articles. 
Wall Street 
Journal 


SIMPR 90-93 


Indexing 


Text segmenta- 
tion, phrase ex- 
traction, index- 
ing knowledge 


Surface 
constraint 
grammar, very 
large text bases 


Technical 
documentation, 
very large sizes 


STREAK 95 


Summary 

generation 


Knowledge 

processing, 

empirical 


Event data sum- 
mary production 


Basketball box 
scores 


SUMMONS 95 


Summary 

generation 


Combination of 

operators, 

empirical 


Integration of 
different sources 


MUC templates, 
time-dependent 
news, corporate 
takeovers, 
terrorism 


May bury 95 


Multimedia 

summarizing 


Statistical, 

knowledge 

processing 


Multimodal 
integration of 
sources 


Battlefield event 
data, numerical 
and text 
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5.4.1 New extraction systems 

Sentence extraction has been proposed as a surrogate for summarizing for as 
long as researchers have thought about automatic summarizing. The most re- 
presentative sentences were selected according to metrics that were easy to 
establish. The informative value of a sentence was calculated from features of 
the linguistic surface. Since anaphora resolution was out of reach, word count 
could not include nominations of a concept by pronominal forms, or by hy- 
pernyms. Concatenated sentences that had been dragged from different con- 
texts made up the extracts. They inevitably presented problems of coherence 
and readability. 

Extraction from the original document is still the most practicable path to 
systems that do something approximating summarization. One recent innova- 
tion is to extract paragraphs instead of sentences; another research line is to 
train extraction systems on human abstracts or relevance judgements. 

Extracting paragraphs instead of sentences. To improve coherence in ex- 
tracts, it helps to some extent to extract larger units from the original: since 
paragraphs give more context and are normally quite self-contained, paragraph 
extraction avoids some of the evident coherence problems such as dangling 
references. Figure 5.13 shows an extract of an encyclopedia article about 
telecommunications which follows the segmented bushy path (see below). 

Paragraph extraction is proposed by MITR97 and SALT97. The authors work 
in the environment of the Smart System (SALT71). They assume that every 
text or text excerpt can be represented in vector form as a set of weighted 
terms. Then they compute pairwise similarity coefficients, showing the similar- 
ity between pairs of texts, based on coincidences in the term assignment to the 
respective items. The similarity function is normalized to range between 0 for 
disjoint vectors and 1 for identical ones. Paragraphs are represented as nodes 
and joined by links based on the numerical similarity computed for each pair 
of paragraphs. Since the similarity is based on the vocabulary overlap between 
them, large enough similarity shows that the vocabulary of two paragraphs 
matches in a meaningful way and allows the interpretation that the two partial 
texts are semantically related. A segment is a contiguous piece of text that is 
linked internally, but largely disconnected from the adjacent text 
(HEAR93/94). Segments or nodes that have many links to other nodes are 
called “bushy”. Since a highly bushy node (paragraph) has an overlapping vo- 
cabulary with many other paragraphs it is likely to discuss topics covered in 
several paragraphs, i.e., it is likely to be an overview paragraph and desirable 
in an extract. 

An extract can be composed by selecting paragraphs from the document 
such that bushy nodes are selected which follow each other in the order of the 
text. Each segment is represented by at least one paragraph. As each paragraph 
is similar to the next on the extraction path, the extract should be coherent. 
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The authors evaluated their results. Measured by the overlap with an ideal 
(manual) extract, they reached a low overlap just as human summarizers do; 
they also found their method in the same performance class as a simple ex- 
traction of the document beginning (see BRAN95). Summarization by extrac- 
tion, the authors concluded, may not be feasible given the wide variation be- 
tween users and the parts of the original document that they want to be repre- 
sented in a summary. 



20% segmented bushy path 

Para 5: The devices used in telecommunications can be computers, terminals 
(devices that transmit and receive information), and peripheral equipment such 
as printers (see Computer; see Office systems). The transmission line used ... 

Para 14: Among the different kinds of software are terminal-emulation, file- 
transfer, host, and network software. Terminal-emulation software makes it 
possible for a device to perform the same function as a terminal. File-transfer 
software is... 

Para 16: Three major categories of telecommunication applications can be 
discussed here: host-terminal, file-transfer, and computer-network communica- 
tions. 

Para 32: An information -retrieval service leases time on a host computer to 
terminals, so that these terminals are able to retrieve information from the host 
computer. An example is CompuServe Information Services. To gain access 
to... 

Para 39: Certain telecommunication methods have become standard in the 
telecommunications industry as a whole, because if two devices use different 
standards they are unable to communicate properly. Standards are developed in 



Fig. 5.13. Extract from article Telecommunications, following the segmented bushy 
path (from MITR97) 

Training summarizers on ^^gold standards’’ (aligned and human-selected 
summary sentences). In KUPI95, the combined sentence extraction methods 
proposed by EDMU69 recur. We find a sentence length feature, the indicator 
phrase, the location method (now called paragraph feature), the thematic 
word, and the proper name heuristic. In a replication study by TEUF97, almost 
the same heuristics are used. They are listed in the following. In case of 
different names for (almost) equal heuristics, the name preferred by KUPI95 is 
given in parenthesis. 

• The cue phrase method (fixed-phrase feature) selects sentences containing 
fixed phrases from a list. KUPI95 use 26 indicator phrases, two-word 
phrases such as this letter or In conclusion. They accept also section head- 
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ings (e.g., Conclusion) as fixed phrases. The cue phrase method of 
TEUFE97 works with a list of 1670 positive and negative cues, indicator 
phrases, and formulaic expressions. The cue phrases were manually sorted 
into 5 classes corresponding to the likelihood of a sentence containing the 
cue to be included in a summary. 

• The location method (paragraph feature) reflects the fact that paragraphs at 
the start and end of a document are more likely to contain material that is 
useful for a summary. Paragraphs also carry crucial information at the be- 
ginning and the end. Therefore, sentences in peripheral document para- 
graphs are good summary candidates, even more so if they occur at the 
paragraph boundaries. 

• The sentence length method (sentence length cut-off feature) scores all 
sentences under a length threshold 0 and all the sentences above the 
threshold a 1 score. This method is the least effective. 

• The thematic word method (thematic word feature) tries to identify key 
words that are characteristic for the document. KUPI95 use the most fre- 
quent content words. TEUFE97 concentrate on non-stop-list words with 
high document frequency which are rare in the overall collection 
(calculated by the standard term frequency divided by document frequency 
measure). Sentences that contain clusters of such thematic words should 
be characteristic for the document. The ten top-scoring words are chosen 
as thematic words. Sentence scores are computed as a weighted count of 
thematic words in sentences, meaned by sentence length. 

• The title method score of a sentence is the mean frequency of title word 
occurrences, excluding stop- words. It is used by TEUFE97. 

• The uppercase word feature of KUPI95 aims at proper names and acro- 
nyms. 

Where EDMU69 weighted his heuristics manually to make them successful, 
KUPI95 and TEUFE97 train them on a corpus before applying them to extract 
sentences. They approach extract selection as a statistical classification prob- 
lem. Given a training set of documents with hand-selected document extracts, 
they apply a function that estimates the probability that a given sentence is 
included in the extract. New extracts can then be generated by ranking sen- 
tences according to this probability and selecting a user-specified number of 
the top scoring ones. 

Assuming the statistical independence of the features, the probability of a 
feature-value pair occurring and the probability of a feature-value pair occur- 
ring in the summary can be estimated from the corpus. 

The classifier computes the probability that a sentence will be included in a 
summary as follows: 
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P(s € S I F , F) ^. PCsgSinl.PCFIsgS) 

P(s € S I F , F ) : Probability that sentence s in the source text 

is included in summary S, given its feature values 

P(s ^ S ) : Compression rate (constant) 

P (F I s € S) : Probability of feature- value pair occurring in a sentence which is 
in the summary 

P (F ) • Probability that the feature- value pair occurs unconditionally 
k : Number of feature- value pairs 
Fj : 7 ’-th feature- value pair 

The training corpus of KUPI95 consists of 188 document/summary pairs from 
engineering. The summaries are created by professional abstractors. On aver- 
age, they are three sentences long. Before training, the sentences of the sum- 
mary must be matched with the respective document sentences. This allows 
the authors to calculate which features made it probable that some sentences 
were chosen for the summary. Indeed, 79% of the 568 summary sentences (451 
sentences) had direct matches in the document, i.e., sentences which could be 
extracted from the original verbatim or with minor modifications, and 19 sum- 
mary sentences revealed themselves as joins combined from more than one 
original sentence. 

For each test document, the trained summarizer provided as many sentences 
as the manual summary. It replicated 35% of the 568 sentences in the hand- 
crafted summaries. Of the 468 matching sentences, it discovered 498 (42%) 
correctly. Table 5.3 shows how the different extraction heuristics perform. 



Table 5.3. Performance of extraction heuristics (from KUPI95) 



Feature 


Individual score 
Sentences correct 


Cumulative score 
Sentences correct 


Paragraph 


163 (33%) 


163 (33%) 


Fixed phrases 


145 (29%) 


209 (42%) 


Length cut-off 


121 (24%) 


217 (44%) 


Thematic word 


101 (20%) 


209 (42%) 


Uppercase word 


100 (20%) 


211 (42%) 



The most effective heuristic selects beginning and end paragraphs of docu- 
ments (the location method). It is followed by cue phrase method, here called 
faed phrases. Less impressive are the outcomes of the sentence length crite- 
rion, the thematic word heuristic, and proper name driven extraction. The best 
combination of heuristics is paragraph + fixed phrase + sentence length. 
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Figure 5.14 shows an example of a manual abstract and the corresponding 
generated extract. 

TEUF97 replicate and expand the experiment of KUPI95. Instead of engi- 
neering papers with abstracts written by professionals, they use 220 papers 
from computational linguistics of some 6-8 pages which have author abstracts. 
The average length of the papers is 210 sentences, the average length of the 
summaries 4.7 sentences. 

In their corpus, only 17.8% of the summary sentences align with document 
sentences. Therefore, TEUF97 add a second “gold standard”, sentences chosen 
as abstract-worthy by a human judge. Four document sets are used to train the 
extraction model with five different heuristics: the cue phrase and location 
methods, the sentence length and thematic word method, and the title word 
method. Table 5.4 gives an overview of their results on the most extended 
training set, using both “gold standards”. The authors compare the performance 
of the extraction methods with a baseline that consists of simply extracting the 
beginning of the document. 

Abstract 

The work undertaken examines the drawability of steel wire rod 
with respect to elements that are not intentionally added to steel. 

Only low carbon steels were selected for experimentation. During 
wire drawing, failure-inducing tensile forces are greatest at the 
center of the wire. This accounts for the classic appearance of 
ductile failure with the center of the wire failing in a ductile man- 
ner. 



Sentence extracts 



• Drawabihty of low carbon steel wire 

• The work undertaken examines the drawability of steel wire 
rod with respect to elements that are not intentionally added 
to steel. 

• For this reason, only low carbon steeels were selected for 
experimentation. 

• During wiredrawing, failure-inducing tensile forces are 
greatest at the center of the wire. 

• This accounts for the classic appearance of ductile failure 
with the center of the wire failing in a ductile manner, while 
the circumference fails last, and in shear. 



Fig. 5.14. Sample manual abstract and corresponding sentence extracts 
(from KUPI95) 



Combined heuristics as proposed by KUPI95 inspired by EDMU69 are also 
found useful in the study of TEUF97. Their method produces very short ex- 
cerpts, with compressions as high as 2-5% of the original. The overall per- 
formance seems to correspond to the ratio of gold standard sentences to source 
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text sentences, i.e., the compression rate, whereas the nature of the two gold 
standards has no apparent influence. Training the model with more data gave 
no significant results. The authors conclude that overall results can be im- 
proved by improving the single heuristics, not by providing more training ma- 
terial. 



Table 5.4. Impact of individual and cumulated heuristics (from TEUF97) 





Precision (%) 


Method 


Individual 


Cumulated 


Cue phrase method 


55.2 


55.2 


Location method 


32.1 


65.3 


Sentence length method 


28.9 


66.3 


Thematic word method 


17.1 


66.5 


Title word method 


21.7 


68.4 


Baseline 


28.0 


28.0 



5.4.2 Referent tracking replaces word frequency counts 

Words were always counted in order to estimate the importance of the concept 
denoted by a certain word. What really counts, in the newer view, are the 
referents words point to. If we read ten statements about white elephants, then 
white elephants must somehow be a topical referent in the document. We es- 
timate their importance better if we manage to recognize all mentions, regard- 
less of whether they are expressed by a lexicalized word, by pronouns {these, 
it), by alternating expressions (elephants, big animals, etc.) that avoid repeti- 
tion, by paraphrases, or whatever. The more completely all occurrences of a 
referent (i.e., an object of the discourse) are counted, the better its importance 
can be approximated. In order to base an importance decision on the presence 
of referents instead of word frequencies, mechanisms of anaphora resolution 
and the identification of related concepts are needed. They can be managed 
with a lexicon or thesaurus and a shallow morphosyntactic analysis of noun 
phrases at hand. 

Salience-based content characterization by capsule overviews. BOGU97 
use capsule overviews. A capsule overview is a semi-formal representation of 
the document, derived through a process of data reduction over the original 
text (see example in Fig. 5.15). It comes nearer to an indexation than to a tex- 
tual summary. To obtain it, the authors identify highly salient phrases across 
the entire document. These are the candidate topic stamps. The set of topic 
stamps makes up the capsule overview of the document. 
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Capsule overviews are generated by a procedure that includes the following 
subtasks: 

• Discourse segmentation 

• Phrasal analysis of nominal expressions and relations 

• Anaphora resolution and generation of the referent set 

• Calculation of discourse salience and ranking of referents by segment 

• Identification of topic stamps 

• Enriching topic stamps with relational context(s). 



“One day, everything Bill Gates has sold you up now, whether it’s Windows95 or Win- 
dows97, will become obsolete,” declares Gilbert Amelio, the boss at Apple Computer. 
“Gates is vulnerable at that point, and we want to make sure we’re ready to come forward 
with a superior answer.” 

Bill Gates vulnerable? Apple would swoop in and take Microsoft’s customers? Ridicu- 
lous! Impossible! In the last fiscal year, Apple lost $816 million; Microsoft made $2.2 
billion. Microsoft has a market value thirty times that of Apple. 

Outlandish and grandiose as Amelio’ s idea sounds, it makes sense for Apple to think in 
such big, bold terms. Apple is in a position where standing pat almost certainly means 
slow death. 

It’s a bit like a patient with a probably terminal disease deciding to take a chance on an 
untested but promising new drug. A bold strategy is the least risky strategy. As things 
stand, customers and outside software developers alike are deserting the company. Apple 
needs something dramatic to persuade them to stay aboard. A radical redesign of the 
desktop computer might do the trick. If they think the redesign has merit, they may feel 
compelled to get on the band wagon lest it leave them behind. 



Capsule overview of the segment 

... APPLE would swoop in... 

... take MICROSOFT’S customers? 

... APPLE lost $816 million; 

... MICROSOFT made $2.2 billion. 

... MICROSOFT has a market value ... 

... APPLE is in a position ... 

... APPLE needs something dramatic ... 



Fig. 5.15. Text segment with automatically generated capsule overview (adapted from 
BOGU97) 



Discourse segmentation works as described by HEAR94 and identifies topi- 
cally coherent sections of text using a lexical similarity measure. Its advantage 
is to break down the problem of content characterization of a large texts to 
finding topic stamps for each segment in the document. 
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The analysis of phrases exploits the linguistic properties of technical terms. 
They are multi-word noun phrases that can be characterized with respect to 
their preferred phrase structure, their behavior with respect to lexicalization, 
contraction patterns and other discourse properties. The TERM algorithm de- 
veloped by JUST95 is applied for term identification. However, it is expanded 
to deliver an extended phrase set, containing an exhaustive listing of the ob- 
jects mentioned in the text. 

Through the application of an anaphora resolution procedure (based on work 
of LAPP94), the extended phrase set is transformed into a set of expressions 
which uniquely identify the objects referred to in the text - the referent set. An 
antecedent for an anaphoric expression is located by first excluding all impos- 
sible candidate antecedents, then ranking the remaining candidates according 
to a local salience measure (see Fig. 5.16) and choosing the most salient can- 
didate. When an anaphoric link is established, the anaphor is added to the 
equivalence class to which the antecedent belongs, and the salience of the 
class is boosted accordingly. 



SENT: 100 iff the expression is in the current sentence. 

CNTX: 50 iff the expression is in the current discourse segment. 
SUBJ: 80 iff the expression is a subject. 

EXST: 70 iff the expression is in an existential construction. 
POSS: 65 iff the expression is a possessive. 

ACC: 50 iff the expression is a direct object. 

DAT: 40 iff the expression is an indirect object. 

OBLQ: 30 iff the expression is the complement of a preposition. 
HEAD: 80 iff the expression is not contained in another phrase. 
ARG: 50 iff the expression is not contained in an adjunct. 



Fig. 5.16. Salience factors used by BOGU97 



Whereas local salience is bound to the segment in which phrases occur, dis- 
course salience reflects the properties of a referent in the whole text. It is the 
basis for an importance-based ranking of referents. By ranking the salience of 
the members of a referent set and the use of a threshold value, the entire set is 
reduced to the most prominent participants in the discourse, denoted by the 
topic stamps. 

Usii^ lexical chains. BARZ97 summarize according to an algorithm that 
identifies lexical chains in a text, merging knowledge from WordNet, a part-of- 
speech tagger and shallow parser for nominal groups, and a text segmentation 
algorithm derived from HEAR94. Picking the concepts represented by shong 
lexical chains gives them a better indication of the central topic of a document 
than simply picking the most frequent words in the text. 
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Relatedness of words (nouns and noun compounds) in a chain is determined by 
the distance between them and the shape of the path connecting them in the 
WordNet Thesaurus. Extra-strong, strong, and medium-strong relations are 
distinguished. Extra-strong relations are established in the document without 
distance limits, whereas strong relations are followed only if the respective 
words occur in a window of seven sentences, and medium-strong relations 
within three sentences. Once strong chains have been selected, the full sen- 
tences from the original are extracted according to the chain distribution. 



5.4.3 From discourse structures to summaries 

Abstracting by pruning rhetorical structures. 0N094 developed an auto- 
matic abstract generation system for Japanese expository text which is based 
on the extraction of rhetorical structures (MANN88). They argue that the rhe- 
torical structure provides a natural order of importance among sentences in the 
text. It can be used to determine which sentences should be extracted and put 
into the abstract, depending on the desired abstract length. Abstracts which re- 
produce the most important items of the rhetorical structure are coherent, since 
the text semantic structure is preserved and surface connectives can be in- 
cluded. Thus they avoid the most visible problems of classical extracts. 

0N094 detect discourse structures from surface connectives (Japanese con- 
nectors corresponding to but, for example, particularly, because, of course, etc.). 
They prune the resulting binary semantic text structure (comparable to the one 
in Fig. 5.18) to derive abstracts of the desired length. As soon as the sentences 
to be included in the abstract are determined, the system alternately arranges 
the sentences and the connectives from which the semantic relations were ex- 
tracted, and thus composes the text of the abstract. 

Abstracts of 30 editorial articles from Asahi Shinbun and 42 technical papers 
from the Toshiba Reviews were selected for a system evaluation. The abstracts 
of the editorial articles hit the most important sentence (determined by three 
human judges) of the document in 60% of the cases; it matched 41 % of the 
key sentences also selected by the human judges. For the technical articles, 
the success quotas were 74% and 51%, respectively. The authors explain the 
better performance on technical texts by the observation that there, explicit 
connectives are more frequent and thus, the system can build up a more com- 
plete rhetorical structure. The system is used as a text browser in a prototypical 
document retrieval system. 

Relevance judgements based on rhetorical structure trees. MARC97b 
shows with empirical methods that a strong correlation exists between the nu- 
clei of the RS (Rhetorical Structure)-tree of a text and the text units that 
raters perceive to be the most important in the text. He derives the RS-tree 
analysis of a text by means of a rhetorical parsing algorithm (MARC97a). It is 
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based on a corpus analysis of more than 450 discourse markers and 7900 text 
fragments. When given a text, the rhetorical parser determines the discourse 
markers and the elementary units that make up the text for hypothesizing rhe- 
torical relations among the elementary units. To determine the valid text struc- 
tures, the parser applies a constraint-satisfaction procedure. If more than one 
valid structuration is found, the parser chooses the “best”, i.e., the most 
skewed-right one. 



1 9 

[With its distant orbit [- 50% farther from the sun than Earth -^] [and slim atmos- 
pheric blanket,^] [Mars experiences frigid weather conditions.^] [Surface temperatures 
typically average about -60 degrees Celsius (-76 degrees Fahrenheit) at the equator^] 
[and can dip to -123 degrees C near the poles.^] [Only the midday sun at tropical lati- 

n 

tudes is warm enough to thaw ice on occasion, ] [but any hquid water formed in this way 
would ev^orate almost instantly ] [because of the low atmospheric pressure.^] 

[Although the atmosphere holds a small amount of water, [and water-ice clouds some- 
times develop,^ ^ ] [most Martian weather involves blowing dust or carbon dioxide.^^] 
[Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole,^^] 
[and a few meters of this dry-ice snow accumulate [as previously frozen carbon diox- 
ide evaporates from the opposite polar cap.^^] [Yet even in the sununer pole,^^] [where 
the sun remains in the sky all day long, ] [temperatures never warm enough to melt fro- 

1 o 

zen water. ] 



Fig. 5.17. Sample text with textual units marked by brackets and numbers (from 
MARCU97b) 



The summarization program selects from the RS-tree produced by the rhetori- 
cal parser the most salient textual units. If the aim of the program is to produce 
a very short summary, only the salient units associated with the internal nodes 
close to the root are selected. The longer the summary, the farther the selected 
units will be from the root. Figure 5.17 shows a short text from the Scientific 
American which has been broken into textual units (in square brackets, num- 
bered). To this text, the rhetorical parser assigns the rhetorical structure tree 
shown in Fig. 5.18. In the drawing, dotted boxes contain satellites, whereas the 
nuclei are put in solid boxes. 

13 independent judges rated each of the textual units according to its impor- 
tance for a summary. They assigned a score of 2 to very important statements, 
1 to moderately important ones which should appear in a long summary, and 0 
to those they considered unimportant. In Table 5.5, their scores are compared 
with the scores given by the summarization system. Although the system has a 
different scoring scheme, reflecting the hierarchical levels of the RS-tree, the 
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agreement with the human judges becomes evident: both humans and the pro- 
gram give the highest score to text unit 4 and put unit 12 in the second rank. 



Table 5.5. Scores given by human raters and the RST-based relevance assessment pro- 
gram (adapted from MARC97b) 





Judges 


Pro- 

gram 


Unit 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


1 
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2 


2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


3 


2 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


2 


3 


0 
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0 


0 


0 


0 


0 


0 


0 


0 


1 


3 


4 


2 


1 


2 
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1 


1 


1 


0 


1 
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1 


0 


2 


2 


4 


6 


0 


1 


0 


1 


1 


1 


0 


1 


1 


1 


0 


2 


2 


4 


7 


0 


2 


1 


0 


0 


0 


1 


1 


1 


0 


0 


0 


0 


3 


8 


0 


1 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


3 


9 


0 


0 


2 


0 


0 


0 


0 


0 


0 


0 


1 


0 


1 


1 


10 


0 


2 


2 


2 


0 


0 


2 


0 


0 


0 


0 


0 


0 


3 


11 


0 


0 
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2 
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0 
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3 
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1 
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2 
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Elaboration 




4 

Elaboration 



4 

Justification 




I 

\ 1 \ 



r--^ 



5,6 

Elaboration 





Fig. 5.18. The RS-tree of maximal weight built by the rhetorical parser (from 
MARCU97b) 



5.4.4 Combinii^ methods from different backgrounds 

The two systems discussed in the following section come from distant back- 
grounds, but they share the strategy of methods combining. The SIMPR system 
draws upon surface linguistic constraints developed in corpus linguistics and 
upon the knowledge of professional indexers. Its aim is to index and retrieve 
text passages from large full-text bases, for instance in technical documenta- 
tion. The second system, the SUMMARIST, combines in an open framework 
methods from statistical information retrieval with lexical semantics as given 
by the WordNet Thesaurus. 

SIMPR: Knowledge-based indexing and classifying of morphosyntactically 
analyzed documents. SIMPR (Structured Information Management: Pro- 
cessing and Retrieval - GIBB90, KARE91, GIBB93) aims at large information 
bases of technical documentation which may correspond to thousands of pages 
when printed. SIMPR represents their text components by indexing terms. An 
index term is a phrase that a reader of the document might use in an 
information search. Figure 5.19 shows a short document and its index-style 
representation. The index is in alphabetical order as usual. Its entries are 
derived from noun phrases of the source document. An automatic indexing 
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procedure provides them. The document length of Fig. 5.19 corresponds to a 
tiny demonstration format in the context of full text retrieval, the environment 
in which the SIMPR system operates. 

The specifics of the SIMPR document representation and summarization are 
best seen from the use situation. The SIMPR retrieval screen (Fig. 5.20) pre- 
sents task-oriented windows. In the background window, the current document 
is represented by its outline. The headings and subheadings of the outline cor- 
respond to the table of contents, which is a particular summary of the docu- 
ment. A large text (a document) can be considered as a text set composed of 
subtexts. They are logical subunits of a document consisting of a heading and a 
text body (one or more paragraphs of running text). 



The Strathclyde OHV ei^ne 



Index 



Removal 

Remove the bonnet and disconnect the battery. Remove 
the intake head and ventilation hose. Open the radiator 
drain plug to drain the cooling system, coUecting the 
coolant in a clean container if it is to be reused. EHsconnect 
aU leads, hoses and control cables from the engine, 
including the exhaust pipes. Remove the baffle plate, 
then remove the radiator as described in Chapter 5. 

Remove the starter motor as described in Chapter 13. 
Suitably support the gearbox or transmission, then remove 
aU bellhousing to crankcase bolts. Atttach the lifting 
equipment to the engine and release the engine mountings. 
On cars fitted with manual gearboxes, slide the engine 
forwards until the gearbox input shaft is clear of the clutch 
unit, then lift out the engine. Do not allow the engine to 
twist or bear against the input shaft until it is freed from 
the clutch unit, or serious clutch damage may result. On 
cars fitted with automatic transmission, disconnect the 
torque converter drive plate, then lift the engine upwards 
and forwards to remove. If a replacement engine is to be 
fitted, remove the anciUary equipment from the old and 
fit ot the new engine. 



automatic transmission 
baffle plate - removal 
battery disconnection 
bonnet - removal 
clean container - coolant 
clear gearbox input shaft 
clutch unit 
control cables 
cooling system - drain 
crankcase bolts 
engine mountings 
equipment - removal 
exhaust pipes 
Strathclyde OHV engine 
intake head - removal 
lifting equipment 
manual gearboxes 
radiator - removal 
radiator drain plug 
remplacement engine 
serious clutch damage 
starter motor - removal 
torque converter drive plate 



Fig. 5.19. A SIMPR index: the Strathclyde OHV engine (from GIBB93) 



In SIMPR, one possible access to a document uses the headings hierarchy. 
Clicking on the outline window of the retrieval screen causes items of the 
headings hierarchy to pop up. Subsequently, the text body attached to the re- 
spective heading opens for inspection. The path system that the outline cuts 
into the text keeps users of online documents oriented and allows them to ex- 
plore the information base in an organized way. 
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The index is a complementary access tool. Indeed SIMPR produces indexes 
for every subsection of text. Thus users can be guided precisely to the passages 
where the index phrase has been obtained. 

On the SIMPR output screen in Fig. 5.20, a natural language query phrase 
(testing the fuel pump for faulty valve settings) has called upon two index en- 
tries (faulty valve settings and fuel pump testing). These can now retrieve the 
text passages (one or several) from where they were derived. The user can dive 
into the document, viewing a passage indicated by an index entry, and from 
there continue by other means, for instance traveling along the headings 
hierarchy. 



EXIT 




ACTION 




OPTIONS 



I Refitting 




total number of analytics: 79, distribution: (0 0 74 0) 
maximum number of texts: 41 



testing the fuel pun^ for faulty valve settings 













ABORT 




ETC 


WEIGHTS 


SEARCH 


VIEW 



1 faulty valve settings 

2 fuel pump testing 




Fig. 5.20. The SIMPR retrieval screen (adapted from CRI92) 



The indexing procedure. The SIMPR indexing procedure is given in Fig. 5.21. It 
merits particular interest because it combines a text analysis using morpho- 
synctactic constraints with a knowledge-based procedure of index term genera- 
tion. 
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Preprocessing. During preprocessing, the textsets for the later processing are 
prepared. Files are imported in ASCII or SGML format. They are cleaned of 
“non-textual” material (e.g., typographic features, mathematics, and program 
code). The complete texts are structured into subtexts that in each case consist 
of a headline and the respective passage of text. The textset is then marked up 
using an internal markup language. This includes inserting end-of-sentence 
markers, labeling punctuation, etc. The words contained in the textset are vali- 
dated against a core lexicon of some 57 000 words. Words which are not found 
in the lexicon can be added automatically or under user control. 

Language analysis. The linguistic analysis software of SIMPR is designed to 
eliminate or reduce ambiguity and to produce a parsed form of the running text 
for indexing. The central idea is to maximize the use of low-level morphologi- 
cal information for parsing, because it is easy to express and effective. This is 
done with morphosyntactic constraints developed by studying large text cor- 
pora. They are akin to tagging and rely on the transitional probabilities in the 
text. Comparable approaches to tagging and partial parsing are also found in 
information extraction systems (PLUM - see AYUS92). Preferably, the con- 
straints reflect absolute rule-like facts, otherwise they express probabilistic 
tendencies. All relevant information is assigned directly from lexicon defini- 
tions, morphological disambiguation rules, and simple mappings from mor- 
phology to syntax. The task of the constraints is basically to discard as many 
alternative interpretations as possible. Some 400 constraints eliminate 95-97 
% of the morphosyntactic ambiguities in English text. The constraint grammar 
syntax assigns flat functional surface labels, roughly corresponding to the clas- 
sical repertoire of heads and modifiers. The output of the language analysis is 
an annotated, linear, flat string of word forms, base forms, inherent features, 
morphological featmes and syntactic function labels for each sentence. 

Constraints are expressed as quadruples of domain, operator, target, and con- 
text conditions. 

Examples: 

(1) (@w=0 “PREP” (-1 DET)) 

If a word @w has a reading with the feature PREP, then the reading is eliminated 

(0) iff the preceding word (-1) has a reading with the feature DET. 

(2) (“that”=! “<REL> (-1 NOMHEAD) (1 VEIN)) 

”that“ should have the reading REL (! assigns a label) when the preceding word 

has the reading NOMHEAD and the following word (1) has the reading VEIN. 

Indexing. The morphosyntactically analyzed text is passed to the MIDAS in- 
dexing system (see Fig. 5.21) which produces a set of candidate indexing 
terms for each text in a textset. A set of ATNs (Augmented Transition Net- 
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works) extract those segments of text which are likely to produce useful access 
points to a document, i.e., the potential index terms. First and foremost, they 
look for noun phrases, but also for other linguistic structures if they are known 
to be important in a document type. For instance, in maintenance manuals, 
procedural statements expressed by imperatives may be important. 




Fig. 5.21. Overview of MIDAS indexing (from GIBB93) 



Next, candidate indexing terms are brought in line with their future function. 
They are transformed according to common indexing knowledge. 

Conjunctions are resolved to simpler noun phrases. Instead of 

servicing or replacing the engine 

two phrases are generated: 

servicing the engine 
replacing the engine 

In the next step, phrases are normalized. As is usual in index entries, the cen- 
tral object is put into the lead position. The verb forms are mapped to one, 
preferably a process noun: 
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removal of the pump filter 
removing the pump filter 
remove the pump filter 

are mapped to: 

pump filter - removal 

After that, the index entries are consolidated, i.e., redundant multiple entries 
are reduced to the most specific one: 

clogged fuel pump 
fuel pumps 
pump 

are reduced to: 

clogged fuel pumps 

During filtering, a stopword list is applied for excluding terms which are not 
useful: whole word classes (e.g., pronouns) but also individual words. The out- 
put may be checked by an indexer. 

SUMMARIST: Summarization in an open integrative system. SUMMA- 
RIST (HOVY97) is designed to produce both extracts and abstracts for ar- 
bitrary English, and later other language documents. It distinguishes between 
extracts consisting of passages from the source text which are reproduced ver- 
batim, and abstracts, i.e., comprehensive and informative summaries that fuse 
together various concepts of the source text into a smaller number of concepts. 
Producing an abstract requires stages of topic fusion and text generation which 
are not needed for extracts. Thus SUMMARIST’ s processing formula includes 
three main subtasks: 

summarization = topic identification + interpretation + generation 

From a methodological point of view, SUMMARIST is also a systematic at- 
tempt to overcome the notorious shortcomings of approaches limited to only 
one scientific background. It integrates methods from several disciplines, in 
particular from classical statistical information retrieval and from symbolic 
knowledge processing with resources such as WordNet. The system architec- 
ture (see Fig. 5.22) reflects that SUMMARIST is intended to produce real 
summaries or abstracts and that it relies heavily on WordNet. What becomes 
less visible is the architecture inside every working stage, where SUMMA- 
RIST also follows a combined methods approach. The idea is to link one 
method after the other into the system, thus improving its processing step by 
step. 

For topic identification, the location method has been studied in detail, using 
the Ziff-Davis corpus of 13 000 newspaper articles announcing computer pro- 
ducts. The updated version of the location method is an Optimal Position Policy 
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(OPP) that gives the positions where high-topic-bearing sentences occur in the 
text: 

[Tl, P2S1, P3S1, P4S1, PlSl, P2S2, {P3S2, P4S2, P5S1, P1S2}, P6S1,...] 

“Tl” is the title, “P2S1” means the first sentence in paragraph 2 and so on. 
The OPP was evaluated against human-supplied keywords which show up in 
the sentences selected by the OPP with very encouraging results. 




Fig. 5.22. Architecture of SUMMARIST (adapted from HOVY97) 



For concept interpretation, during which the extracted concepts are fused into 
higher level unifying concepts, two methods have been elaborated: 

• the concept wavefront method 

• the concept signature 

The concept wavefront is the level in the taxonomy where a set of nodes repre- 
senting concepts each subsume a set of approximately equally strongly repre- 
sented subconcepts, i.e., ones that have no obvious dominant subconcept. As 
an approximation to counting concepts instead of words and to generalizing at 
a natural level (at the wavefront), the semantic links of WordNet are ex- 
ploited. They give a generalization hierarchy of semantically related concepts 
and lead to the concepts at the wavefront to which other concepts are sub- 
sumed. Although the limitations of WordNet proved to be a serious drawback, 
the results of an evaluation showed that the integration of semantic knowledge 
from WordNet enables improvements over traditional word-based methods of 
Information Retrieval. 

Concept signatures consist of a set of related words that can be joined to a 
signature head, contributing to a broad classification of a document. For in- 
stance, if we have for a given domain subclasses (signature heads) such as En- 
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ment, Telecommunications, Computers, and Financing, a text presenting the 
concepts pollution and heavy metals goes into the Environment class, whereas 
a document talking about interest rates, mortgages, and taxation belongs to Fi- 
nancing. SUMMARIST will, after topic identification, use a signature-based 
concept interpretation to subsume the topic words under the signature head 
concepts. Constructing signatures worked well with a set of 30 000 texts from 
the Wall Street Journal 1987. There each text is classified by the editors into 
one of 32 classes. With the standard term frequency divided by document 
frequency metric, the occurrences of morphologically normalized content 
words were counted over the whole corpus. The top-scoring 300 words of each 
category are used as signature. 

SUMMARIST is presumed to have three sorts of output: 

• a topics list corresponding to traditional keywords 

• phrases that integrated noun phrase and clause-sized units 

• well-formed fluent-text summaries which result from full sentence planning 
and generation. 



5.4.5 Summary text production from formatted data input 

The next group of systems conceive summarizing as summary production from 
a structured knowledge or data source. Functionally, this idea of summarizing 
corresponds to a human who gives a brief report of an event or story (s)he 
knows by heart, i.e., from a memory representation. Both systems adapt to the 
summary style of their domains, which is basketball on the one hand and ter- 
rorism news on the other. While the basketball news generator STREAK fo- 
cuses on individual games, but putting them into their context, the SUMMONS 
system summarizes knowledge about an event integrated from different 
sources, e.g., newswires, after a certain lapse of time. 

STREAK: producing basketball summaries. The STREAK system de- 
scribed by MCKE95a operationalizes summarizing a basketball match as the 
task of producing a short game report such as the lead sentence shown in Fig. 
5.23. In daily newspapers, there is one box-score accompanying each game 
report. The lead sentences found in sports reports summarize the highlights of 
the game, underscoring their significance in the light of previous games. 
Comparing the table and the text format of the game information in Fig. 5.23, 
most readers will agree that a short textual report has certain advantages over 
raw box score data. In the textual presentation, the quantitative information is 
accompanied by some interpretation and put into the context of the game and 
possibly into the wider horizon of movements of the whole league during the 
entire season. This makes individual figures more meaningful and easier to 
absorb. 
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ORLANDO, Fla. (UPI) - Kenny Smith scored 28 points Sunday night to pace 
the Houston Rockets to a 99-94 victory over Orlando, giving the Magic their 
league-high 10th straight loss. 

Hakeem Olajuwon contributed 17 points, 14 rebounds and eight blocked shots 
before fouling out with 3:15 remaining in the game. 

The Magic led 48-45 at halftime, but Houston outscored Orlando 23-12 in the 
first eight minutes of the third quarter to take the lead for good. 

Smith converted 12 of 15 shots from the field and dished out nine assists to give 
the Rockets their sixth win in eight meetings with Orlando. 

Scott Skiles provided a spark from the bench with 21 points, seven assists and 
four steals for the Magic, which lost for the 14th time in 15 games. 

Orlando was outrebounced 46-36 and shot just 43.5-percent from the field. The 
Magic have dropped their last six home games. 



Fig. 5.23. Sample of STREAK input and a related natural basketball game report (firom 
MCKE95a) 
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Empirically speaking, sentences in newswire basketball game summaries are 
rather complex and densely packed with information. They range from 21 to 46 
words in length. Essential facts such as game result are always mentioned in 
the lead sentence. They appear in fixed locations across reports. Other informa- 
tion floats. Floating facts appear to be opportunistically placed where the form 
of the surrounding text allows. Although optional in any given sentence, float- 
ing information cannot be ignored. It accounts for over 40% of lead sentence 
content. The information conveying historical significance of facts is only con- 
veyed as floating material. 

This information structure of basketball reports suggests producing summa- 
ries in two stages. The STREAK system first generates a draft output from in- 
formation that must be included. Then it revises this draft, including floating 
information, for instance about historical aspects. With this revision-based 
technique, STREAK arrives at the complex, information-rich sentences found 
in “natural” summaries of the basketball domain. 






Natural language 
summary 



Fig. 5.24. The architecture of the STREAK system (from MCKE95a) 
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Figure 5.24 shows how the STREAK system works. There are five main mod- 
ules: the fact generator, the sentence planner, the lexicalizer, the sentence 
reviser, and SURGE, a package for lexical and syntactic surface generation. 

The fact generator has the task of conceptual summarization. It takes as in- 
put the game box score and a database of historical statistics. The gamebox 
score gives new data from the game. New statistics other than the most signifi- 
cant player statistics can also be obtained from the box-score. It is placed on 
the list of second priority information. The second kind of potential information 
is historical. It places current game results and player statistics in some histori- 
cal context, thus underscoring their significance. The fact generator will 
ultimately query a database of historical basketball statistics to retrieve for 
each new statistic a set of related historical statistics. Box-scores are indeed 
available online through several newswire and sport statistics computer serv- 
ices. The historical data is used to assess the relevance of the new data and 
vice versa. Thus, a new statistic will be ranked higher on the potential fact list 
if it is historically significant and the historical facts that best relate to sig- 
nificant new statistics will be ordered higher. 




Fig. 5.25. Symbolic input for box score facts (from MCKE95a) 



As output the fact generator produces a list of essential facts that must be in- 
cluded and a list of second priority facts that can be added when the opportu- 
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nity arises. Each fact is a conceptual network encoded as a feature structure. 
There, purely numerical values are represented in a symbolic form and accom- 
panied by their meaning, e.g., as being a winning or a losing score. 

The example in Fig. 5.25 represents two facts. This appears as compulsory 
information in the box score in the Houston Rocket’s table in row 5 and in the 
first sentence of the report. The sentence planner takes as input the flat con- 
ceptual network containing all essential facts and maps it onto a semantic tree 
which represents the overall sentence structure. The lexicalizer then maps the 
semantic tree onto a lexicalized skeletal syntactic tree. 

The sentence reviser consists of a revision rule interpreter with a set of revision 
rules drawn from the corpus analysis. It takes as input both the draft sentence, 
encoding the sentence meaning in the semantic tree and words as well as 
syntactic structures in the thematic feature structure, and the list of second 
order-facts. The output of the sentence reviser is a final draft incorporating both 
the essential facts of the first draft and as many second-priority facts as could 
be added following the domain, linguistic, and space constraints encoded in 
the revision rules. After revision, the actual natural language text is produced 
by SURGE. Figure 5.26 shows how an output sentence is further packed with 
information by the reviser. 



1. Initial draft (basic sentence pattern): 

“Hartford, CT — Karl Malone scored 39 points Friday night as the Utah Jazz defeated the Boston 
Celtics 118-94,” 

2. Adjunctization: 

“Hartford, CT - Karl Malone lied a session high with 39 points Friday night as the Utah Jazz 
defeated the Boston Celtics 118-94.” 

3. Conjoin: 

“Hartford, CT — Karl Malone tied a session high with 39 points and Jay Humphries added 24 
Friday night as the Utah Jazz defeated the Boston Celtics 118-94.” 

4. Absorb: 

“Hartford, CT - Karl Malone tied a session high with 39 points and Jay Humphries came off the 
bench to add 24 Friday night as the Utah Jazz defeated the Boston Celtics 118-94.” 

5. Nominahzation: 

“Hartford, CT - Karl Malone tied a session high with 39 points and Jay Humphries came off the 
bench to add 24 Friday night as the Utah Jazz handed the Boston Celtics their sixth straight 
home defeat 118-94.” 

6. Adjoin: 

“Hartford, CT - Karl Malone tied a session high with 39 points and Jay Humphries came off the 
bench to add 24 Friday night as the Utah Jazz handed the Boston Celtics their franchise record 
sixth straight home defeat 1 1 8-94.” 



Fig. 5.26. STREAK revising a sentence - example of informational upgrading in a 
summary (from MCKE95a) 
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SUMMONS: Generating summaries of multiple news articles. SUMMONS 
(SUMMarizing Online NewS articles - MCKE95b) is a prototypical system 
that demonstrates how summaries of a series of news articles about the same 
event can be generated. The system attempts to generate fluent text from sets 
of templates produced by information extraction systems that participated in 
MUC-4. For test purposes, the authors supplemented them with some hand- 
crafted templates. Summaries consist of one paragraph. They describe an event 
and its change over time, or a series of related events. 

SUMMONS summarizes press articles as illustrated by the excerpt in Fig. 
5.27. Figure 5.28 shows two of the four templates that serve as input to the 
summary in Fig. 5.29. In the summary, two different operators have left their 
traces. The contradiction operator has handled the diverging opinion of Reuters 
and Associated Press, and the refinement operator has inserted additional in- 
formation on the number of victims. The resulting summary text uses lexical 
cues such as however, exactly, and finally that articulate the composition of 
the summary information. 



An explosion apparently caused by a car bomb in an underground 
garage shook the World Trade Center in Lower Manhattan with 
the force of a small earthquake shortly after noon yesterday, col- 
lapsing walls and floors, igniting fires and plunging the city’s 
largest building complex into a maelstrom of smoke, darkness 
and fearful chaos. 

The police said the blast killed at least five people and left more 
than 650 others injured, mostly with smoke inhalation or minor 
burns. 



Fig. 5.27. Illustrative excerpt of terrorist news article (from MCKE95b) 



Figure 5.30 gives an overview of the system. The incoming MUC templates are 
fed into a content planner. The planner decides what information from the 
MUC templates should be included in the summary, using a set of planning 
operators that are specific to summarization and in part to the terrorist domain 
of MUC-4. The linguistic component determines the words and surface syntac- 
tic form of the summary. It consists of a lexical chooser and a sentence genera- 
tor which uses a systemic grammar of English. The ontologizer fulfils an inter- 
mediary mission, converting factual information from the message data base 
into records which are compatible with an ontology of the terrorist domain. 

At the start of processing, input templates are chosen such that they coincide 
in many of their attribute-value pairs. The set of similar templates is merged 
into a simpler structure, keeping the common features and marking the distinct 
ones. 




5.4 New technology, increased demand, a new wave of systems 357 



The summarizing operators of the system were identified through empirical 
analysis of corpora, including the Wall Street Journal as well as Reuters and 
Associated Press newswires. Also extracted from the corpora were several 
hundred language constructions which are typical for summaries. 



MESSAGE: ID 


TST-COL-0001 


SECSOURCE: SOURCE 


Reuters 


SECSOURCEiDATE 


26 FEB 93 
EARLY AFTERNOON 


INCIDENT: DATE 


26 FEB 93 


INCIDENT: LOCATION 


WORLD TRADE CENTER 


INCIDENT: TYPE 


BOMBING 


HUM TGT: NUMBER 


AT LEAST 5 


MESSAGE: ID 


TST-COL-0002 


SECSOURCE; SOURd 


Associated Press 


SECSOURCE: DATE 


26 FEB 93 19:00 


INCIDENT: DATE 


26 FEB 93 


INCIDENT: LOCATION 


WORLD TRADE CENTER 


INCIDENT: TYPE 


BOMBING 


HUM TGT: NUMBER 


5 



Fig. 5.28. Two message templates about the World Trade Center bombing (from 
MCKE95b). Slots with different information are in boldface. 



In the afternoon of February 26, 1993, Reuters reported that a 
suspected bomb killed at least five people in the World Trade 
Center. However, Associated Press announced that exactly five 
people were killed in the blast. Associated Press announced that 
Arab terrorists were possibly responsible for the terrorist act. 



Fig. 5.29. Integrative summary of four newswire messages (from MCKE95b) 



At each integration step, a summary operator is selected based on the observed 
similarities between the information in the templates. This operator combines 
the information from the inputs and produces a new integrated template. Each 
of the seven operators is subdivided to cover variations in its input and output. 
Often, the operator has to link information from two sotuce templates. Since 
the reports are summarized over a period of time, highlighting how the knowl- 
edge of the event changed is important. 
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Fig. 530. SUMMONS system architecture (adapted from MCKE95b) 



The following list given by MCKE95b illustrates all seven implemented opera- 
tors of SUMMONS with a short example. 

1. The change of perspective operator updates initially incomplete informa- 
tion with later and better knowledge. 

Example: 

The afternoon of February 26, 1993, Reuters reported that a suspected bomb 
killed at least five people in the World Trade Center. Later, AP announced that 
exactly five people were killed in the blast. 

2. The contradiction operator states a contradiction between two sources. 

Example: 

The afternoon of February 26, 1993, Reuters reported that a suspected bomb 
killed at least five people in the World Trade Center. However, Associated Press 
announced that exactly five people were killed in the blast. 

3. The addition operator integrates additional facts. They may occur after the 
initial report or additional information may become known. 

Example: 

January 1, 1994, Reuters announced that three terrorists killed four civilians in 
the first assault. Later, Reuters reported that three people were killed in the sec- 
ond assault. A total of seven people were killed in the two assaults. 

4. The refinement operator deals with more precise information that is added 
subsequently to a general fact. Since the update is assigned a higher impor- 
tance value, it is favored over the previous message in a short summary. 
Example: 
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Finally, Associated Press announced that Arab terrorists were possibly respon- 
sible for the terrorist act. 

5. The agreement operator reports the agreement of two sources. 

Example: 

The morning of March 1, 1994, UPI reported that a man was kidnapped in the 
Bronx. Later, this was confirmed by Reuters. 

6. The superset operator combines information from different sources which 
all have incomplete information. It thus produces a more complete sum- 
mary. 

Example: 

According to UPI, three terrorists were arrested in Medellin last Tuesday. Reuters 
announced that the police arrested two drug traffickers in Bogota last Wednes- 
day. A total of five criminals were arrested in Colombia last week. 

7. The no information operator draws the attention to the fact that informa- 

tion from a source is missing, although it might be expected to be present. 
Example: 

Two bombs exploded near government ministeries in Baghdad, but there was no 
immediate word of any casualties, Iraqi dissidents reported Friday. There was no 
independent confirmation of the claims by the Iraqi National Congress. Iraq’s 
state-controlled media have not mentioned any bombings. 

A trend operator is intended to notice that two or more messages reflect 
analogous patterns over a period of time. The operator is not yet implemented. 



5.4.6 Generating summaries from mixed-mode event data 

Events may be reported by an observer talking or writing, but this is by no 
means the only way to obtain information about them. Measuring instruments 
can take wind-force values continuously, cameras may depict the stormy land- 
scape, and a microphone can record the noise of the hurricane. Event data may 
originate from real-world events, from events which have been stored manually 
in a database, or from simulated events, for example from a battle simulator. 
Event data may include for instance measurement data, human reports, a video 
clip and the simulator’s control messages. Summarization must then deal with 
different types of information, assess their relevance in a fashion that fits the 
structure of the data, integrate information of different type in one summary 
representation and present the summary with an appropriate choice of media. 

Summarizing mixed-mode battlefield information. In the battle simulation 
application, the summary generator described by MAYB95 takes as input time- 
stamped event messages from the simulator which it then selects, aggregates 
and presents as a battle summary. Some battle event messages are shown in 
Fig. 5.31. They contain a time element, the message and the sender. The first 
message reads: 
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At time 2774438460, the 50th Tactical Fighter Wing Operations Center asked 
mission ACAIOO to begin execution. 



(2774438460 (ASK OCAIOO BEGIN MISSION EXECUTION9 WOC-50-TACTICAL-FIGHTER-WING) 
(2774438461 (ASK WOC-5 0-TACTIC AL-FIGHTER-WING DISPENSE 4 F-16 AIRCRAFT) OCAIOO) 
(2774439140 (ASK MOBILE-SAM2 FIRE A MISSILE AT OCAIOO) CLOCK) 

(2774433940 (ASK MOBILE-SAMl FIRE A MISSILE AT OCAIOO) CLOCK) 

(2774433980 (ASK RFLIOO BEGIN MISSION ECECUTION) WOC-IST-REFUELING-WING) 



Fig. 5.31. Some sample battle event messages (time, message, sender) 



Messages of this type are complemented by radar supervision data (see Fig. 
5.32). It is not obvious what information is important and should be included in 
the summary. 

One dimension of relevance is given by the concerns of the users. In a mili- 
tary environment, operations staff are concerned with strike missions and so 
on; logisticians care about refueling and transport missions, intelligence users 
care about the type, size, location, and activities of enemy forces. Conse- 
quently, for each group different information may be important. 




Fig. 5.32. Some time-dependent measurement data in a battle simulator (from 
MAYB95) 



In a battle, events dealing with bombing or missile launches are, in general, 
more significant than movement events. And yet certain kinds of movement 
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may be very important, for instance those leading up to a bombing, or those re- 
sulting in strategic repositioning of assets. In this case, classes of events are 
thought to be important. We can assign the salience of an event to its features, 
for instance the place where it happened: bombing New York City is more sa- 
lient than bombing Paris, TX. 

Sometimes, event frequency and frequency of relations between events can 
be used as a positive relevance criterion. A current assumption is that the rela- 
tive importance of a particular event is determined by the amount and type of 
links between it and other events. Events that cause or enable many other 
events should be more significant than events that are isolated from other 
events. However, in simulations and other event-oriented environments, events 
that occur frequently tend to be less significant than those that occur infre- 
quently. Conversely, events that are mentioned more frequently tend to be 
more important, i.e., more indicative of the source content. 

In addition to frequency, the distribution of events over a period of time 
helps to determine importance. Events that occur with constant frequency or 
periodicity are more common and hence less significant. Figure 5.32 plots fre- 
quencies of several different kinds of events from a battle simulation. Long- 
range radars are frequently sweeping. Surface to Air Missile (SAM) sites are 
always repositioning themselves, aircraft are always flying their routes. So 
these events are deemed uninteresting and not reported in the summary. In 
contrast, missile firings and hits are less frequent and thus key events. In do- 
mains in which histories (such as previous simulation runs) can be captured, 
we can calculate average event frequency over a period of time and use them 
to identify variations from the norm. In a next move, we can include probabil- 
ity distribution functions to detect variations. 

The events selection criteria can be adapted to user classes, with the result 
that summaries vary in length and content. The summaries also proved to be 
effective: in information acquisition tasks, subjects who used summaries were 
considerably faster than subjects who searched the complete source reports for 
the same data. 

Because of the large number of events during a simulation run, events must 
be sampled. The result is still too much for a summary. Events must be se- 
lected according to their importance. First, the messages are parsed into a se- 
mantic network of events, each event containing attribute slots such as the 
type of event and its time and place of occurrence, as well as role slots indi- 
cating the agent, recipient, instrument and so on of the event. The agents of 
events are missions such as strike or refueling, as well as organizations or at- 
tacking forces. Event selection from the event network is guided by an impor- 
tance metric which measures the significance of an event relative to other 
events in the simulation. Importance is determined as a function of: 
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• the frequency of occurrence of simulated events in the network, 

• the kind and amount of links or relations associated with an event or state 
in the event network, and 

• domain-specific knowledge of importance. 

After determining which events will remain in the summary, equivalent events 
are aggregated into one statement. This happens, for instance, when three mis- 
siles are fired from the same agent at the same target at the same time. All 
events are grouped by the mission they relate to. Then the report is planned. It 
is sequenced first by the topic (e.g., mission OCAIOO), then chronologically, 
resulting in a multiparagraph summary. The missions are ordered by the num- 
ber of important events they include. 
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